首页> 外文会议>Scientific and statistical database management >SkyQuery: An Implementation of a Parallel Probabilistic Join Engine for Cross-Identification of Multiple Astronomical Databases
【24h】

SkyQuery: An Implementation of a Parallel Probabilistic Join Engine for Cross-Identification of Multiple Astronomical Databases

机译:SkyQuery:用于多个天文数据库的交叉标识的并行概率联接引擎的实现

获取原文
获取原文并翻译 | 示例

摘要

Multi-wavelength astronomical studies require cross-identification of detections of the same celestial objects in multiple catalogs based on spherical coordinates and other properties. Because of the large data volumes and spherical geometry, the symmetric N-way association of astronomical detections is a computationally intensive problem, even when sophisticated indexing schemes are used to exclude obviously false candidates. Legacy astronomical catalogs already contain detections of more than a hundred million objects while ongoing and future surveys will produce catalogs of billions of objects with multiple detections of each at different times. One time, pair-wise cross-identification of these large catalogs is not sufficient for many astronomical scenarios. Consequently, a novel system is necessary that can cross-identify multiple catalogs on-demand, efficiently and reliably. In this paper, we present our solution based on a cluster of commodity servers and ordinary relational databases. The cross-identification problems are formulated in a language based on SQL, but extended with special clauses. These special queries are partitioned spatially by coordinate ranges and compiled into a complex workflow of ordinary SQL queries. Workflows are then executed in a parallel framework using a cluster of servers hosting identical mirrors of the same data sets.
机译:多波长天文研究需要根据球坐标和其他属性交叉识别多个目录中的同一天体。由于数据量大和球形几何结构,即使使用复杂的索引方案来明显排除错误的候选对象,天文检测的对称N向关联也是一个计算密集型问题。传统的天文目录已经包含了超过一亿个目标的探测,而正在进行的和未来的调查将产生数十亿个目标的目录,并且每个目标在不同时间进行多次探测。一次,这些大型目录的成对交叉识别不足以用于许多天文场景。因此,需要一种新颖的系统,该系统可以按需,高效且可靠地交叉识别多个目录。在本文中,我们提出了基于一组商品服务器和普通关系数据库的解决方案。交叉标识问题是基于SQL的语言提出的,但是通过特殊的子句进行了扩展。这些特殊查询按坐标范围在空间上划分,并编译为普通SQL查询的复杂工作流。然后,使用托管相同数据集的相同镜像的服务器集群在并行框架中执行工作流。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号