...
首页> 外文期刊>Publications of the Astronomical Society of the Pacific >Column Store for GWAC: A High-cadence, High-density, Large-scale Astronomical Light Curve Pipeline and Distributed Shared-nothing Database
【24h】

Column Store for GWAC: A High-cadence, High-density, Large-scale Astronomical Light Curve Pipeline and Distributed Shared-nothing Database

机译:GWAC的列存储:高节奏,高密度,大规模天文光曲线管道和分布式无共享数据库

获取原文
获取原文并翻译 | 示例
           

摘要

The ground-based wide-angle camera array (GWAC), a part of the SVOM space mission, will search for various types of optical transients by continuously imaging a field of view (FOV) of 5000 degrees(2) every 15 s. Each exposure consists of 36 x 4k x 4k pixels, typically resulting in 36 x similar to 175,600 extracted sources. For a modern time-domain astronomy project like GWAC, which produces massive amounts of data with a high cadence, it is challenging to search for short timescale transients in both real-time and archived data, and to build long-term light curves for variable sources. Here, we develop a high-cadence, high-density light curve pipeline (HCHDLP) to process the GWAC data in real-time, and design a distributed shared-nothing database to manage the massive amount of archived data which will be used to generate a source catalog with more than 100 billion records during 10 years of operation. First, we develop HCHDLP based on the column-store DBMS of MonetDB, taking advantage of MonetDB's high performance when applied to massive data processing. To realize the real-time functionality of HCHDLP, we optimize the pipeline in its source association function, including both time and space complexity from outside the database (SQL semantic) and inside (RANGE-JOIN implementation), as well as in its strategy of building complex light curves. The optimized source association function is accelerated by three orders of magnitude. Second, we build a distributed database using a two-level time partitioning strategy via the MERGE TABLE and REMOTE TABLE technology of MonetDB. Intensive tests validate that our database architecture is able to achieve both linear scalability in response time and concurrent access by multiple users. In summary, our studies provide guidance for a solution to GWAC in real-time data processing and management of massive data.
机译:SVOM太空任务的一部分,地面广角摄像机阵列(GWAC)将通过每15 s连续成像5000度(2)的视场(FOV)来搜索各种类型的光学瞬变。每次曝光由36 x 4k x 4k像素组成,通常会导致36 x类似于175,600提取源。对于像GWAC这样的现代时域天文学项目,它以高节奏生成大量数据,要在实时数据和存档数据中搜索短时标瞬变,并为变量建立长期光曲线,将是一个挑战。资料来源。在这里,我们开发了一个高节奏,高密度的光曲线管道(HCHDLP)来实时处理GWAC数据,并设计了一个分布式无共享数据库来管理将用于生成大量存档数据的数据在运行10年中拥有超过1000亿条记录的源目录。首先,我们基于MonetDB的列存储DBMS开发HCHDLP,充分利用了MonetDB在海量数据处理中的高性能。为了实现HCHDLP的实时功能,我们在其源关联功能中优化了管道,包括数据库外部(SQL语义)和内部(RANGE-JOIN实现)的时间和空间复杂性,以及其策略建立复杂的光曲线。优化的源关联函数可加速三个数量级。其次,我们通过MonetDB的MERGE TABLE和REMOTE TABLE技术使用两级时间分区策略构建分布式数据库。严格的测试验证了我们的数据库体系结构既可以实现响应时间的线性可伸缩性,又可以实现多个用户的并发访问。总而言之,我们的研究为GWAC实时数据处理和海量数据管理解决方案提供了指导。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号