Toward real-time data query systems in HEP

Jim Pivarski; David Lange; Thanat Jatuphattharachat

首页> 外文期刊>Journal of Physics: Conference Series >Toward real-time data query systems in HEP

【24h】

Toward real-time data query systems in HEP

机译：走向HEP中的实时数据查询系统

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Exploratory data analysis tools must respond quickly to a user's questions, so that the answer to one question (e.g. a visualized histogram or fit) can influence the next. In some SQL-based query systems used in industry, even very large (petabyte) datasets can be summarized on a human timescale (seconds), employing techniques such as columnar data representation, caching, indexing, and code generation/JIT-compilation. This article describes progress toward realizing such a system for High Energy Physics (HEP), focusing on the intermediate problems of optimizing data access and calculations for "query sized" payloads, such as a single histogram or group of histograms, rather than large reconstruction or data-skimming jobs. These techniques include direct extraction of ROOT TBranches into Numpy arrays and compilation of Python analysis functions (rather than SQL) to be executed very quickly. We will also discuss the problem of caching and actively delivering jobs to worker nodes that have the necessary input data preloaded in cache. All of these pieces of the larger solution are available as standalone GitHub repositories, and could be used in current analyses.

机译：探索性数据分析工具必须快速响应用户的问题，以便对一个问题的答案（例如可视化的直方图或拟合）可以影响下一个问题。在一些工业中使用的基于SQL的查询系统中，甚至可以使用列数据表示，缓存，索引以及代码生成/ JIT编译等技术，在人类时间尺度（秒）上总结非常大的数据集。本文介绍了实现高能物理（HEP）系统的过程，重点关注优化数据访问和“查询大小”有效载荷（例如单个直方图或一组直方图）而不是大型重构或直方图的计算的中间问题。数据掠夺工作。这些技术包括将ROOT TB分支直接提取到Numpy数组中，以及可以快速执行的Python分析函数（而不是SQL）的编译。我们还将讨论将作业缓存并主动将作业交付给已在缓存中预加载了必要输入数据的工作程序节点的问题。所有较大解决方案的所有这些部分都可以作为独立的GitHub存储库提供，并且可以用于当前的分析中。

著录项

来源
《Journal of Physics: Conference Series》 |2018年第3期|共页
作者
Jim Pivarski; David Lange; Thanat Jatuphattharachat;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类物理学;
关键词

相似文献

外文文献
中文文献
专利

1. A Lossless Data Compression System for a Real-Time Application in HEP Data Acquisition [J] . Patauner C., Marchioro A., Bonacini S., Nuclear Science, IEEE Transactions on . 2011,第4期

机译：HEP数据采集中实时应用的无损数据压缩系统
2. Multiclass query scheduling in real-time database systems [J] . HweeHwa Pang, Carey M.J. IEEE Transactions on Knowledge and Data Engineering . 1995,第4期

机译：实时数据库系统中的多类查询调度
3. Studies for Optimization of Data Analysis Queries for HEP Using HERA—B Commissioning Data [J] . VascoAmaral, GuidoMoerkotte, 等高能物理与核物理计算国际会议公报：英文版 . 2001,第001期

机译：使用HERA-B调试数据优化HEP数据分析查询的研究
4. A lossless data compression system for a real-time application in HEP data acquisition [C] . Patauner Christian, Marchioro Alessandro, Bonacini Sandro, 2010 17th IEEE-NPSS Real Time Conference : Conference Record . 2010

机译：用于HEP数据采集的实时应用的无损数据压缩系统
5. Real-Time Query Systems for Complex Data Sources. [D] . Rose, Ian Thomas. 2011

机译：复杂数据源的实时查询系统。
6. Conceptual Inferencing for Real-time Clinical Decision Support Using Hierarchical Queries in a Relational Database Management System [O] . Robert R. Hausam, Bo Lu 2002

机译：关系数据库管理系统中使用分层查询的实时临床决策支持的概念推断。
7. Reliable Data Aggregation for Real-Time Queries in Wireless Sensor Systems [O] . Kam-Yiu Lam, Henry C. W. Pang, Sang H. Son, 2004

机译：无线传感器系统中实时查询的可靠数据聚合

Toward real-time data query systems in HEP

摘要

著录项

相似文献

相关主题

期刊订阅