首页> 外文会议>IEEE International Conference on Data Engineering >Multi-Dimensional Genomic Data Management for Region-Preserving Operations
【24h】

Multi-Dimensional Genomic Data Management for Region-Preserving Operations

机译:用于保存区域的多维基因组数据管理

获取原文

摘要

In previous work, we presented GenoMetric Query Language (GMQL), an algebraic language for querying genomic datasets, supported by Genomic Data Management System (GDMS), an open-source big data engine implemented on top of Apache Spark. GMQL datasets are represented as genomic regions (i.e. intervals of the genome, included within a start and stop position) with an associated value, representing the signal associated to that region (the most typical signals represent gene expressions, peaks of expressions, and variants relative to a reference genome.) GMQL can process queries over billions of regions, organized within distinct datasets. In this paper, we focus on the efficient execution of region-preserving GMQL operations, in which the regions of the result are a subset of the regions of one of the operands; most GMQL operations are region-preserving. Chains of region-preserving operations can be efficiently executed by taking advantage of an array-based data organization, where region management can be separated from value management. We discuss this optimization in the context of the current GDMS system which has a row-based (relational) organization, and therefore requires dynamic data transformations. A similar approach applies to other application domains with interval-based data organization.
机译:在之前的工作中,我们介绍了GenoMetric查询语言(GMQL),这是一种用于查询基因组数据集的代数语言,由Genomic Data Management System(GDMS)支持,Geomical Data Management System(GDMS)是在Apache Spark之上实现的开源大数据引擎。 GMQL数据集表示为具有相关值的基因组区域(即基因组的间隔,包括在起始和终止位置),并表示与该区域相关的信号(最典型的信号表示基因表达,表达峰和相对变体) GMQL可以处理数十亿个区域的查询,这些区域组织在不同的数据集中。在本文中,我们专注于高效执行保留区域的GMQL操作,其中结果的区域是其中一个操作数的区域的子集。大多数GMQL操作都是保留区域的。通过利用基于数组的数据组织,可以将区域保存操作链与价值管理分开,从而有效地执行区域保存操作链。我们将在当前GDMS系统的背景下讨论此优化,该系统具有基于行的(关系)组织,因此需要动态数据转换。类似的方法适用于具有基于间隔的数据组织的其他应用程序域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号