首页> 美国卫生研究院文献>Scientific Reports >Efficient Genomic Interval Queries Using Augmented Range Trees
【2h】

Efficient Genomic Interval Queries Using Augmented Range Trees

机译:使用增强范围树的高效基因组间隔查询

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Efficient large-scale annotation of genomic intervals is essential for personal genome interpretation in the realm of precision medicine. There are 13 possible relations between two intervals according to Allen’s interval algebra. Conventional interval trees are routinely used to identify the genomic intervals satisfying a coarse relation with a query interval, but cannot support efficient query for more refined relations such as all Allen’s relations. We design and implement a novel approach to address this unmet need. Through rewriting Allen’s interval relations, we transform an interval query to a range query, then adapt and utilize the range trees for querying. We implement two types of range trees: a basic 2-dimensional range tree (2D-RT) and an augmented range tree with fractional cascading (RTFC) and compare them with the conventional interval tree (IT). Theoretical analysis shows that RTFC can achieve the best time complexity for interval queries regarding all Allen’s relations among the three trees. We also perform comparative experiments on the efficiency of RTFC, 2D-RT and IT in querying noncoding element annotations in a large collection of personal genomes. Our experimental results show that 2D-RT is more efficient than IT for interval queries regarding most of Allen’s relations, RTFC is even more efficient than 2D-RT. The results demonstrate that RTFC is an efficient data structure for querying large-scale datasets regarding Allen’s relations between genomic intervals, such as those required by interpreting genome-wide variation in large populations.
机译:基因组区间的有效大规模注释对于精确医学领域的个人基因组解释至关重要。根据艾伦的区间代数,两个区间之间存在13种可能的关系。常规间隔树通常用于标识满足查询间隔的粗略关系的基因组间隔,但无法支持对更精细关系(例如所有艾伦关系)的有效查询。我们设计并实现了一种新颖的方法来解决这一未满足的需求。通过重写艾伦的区间关系,我们将区间查询转换为范围查询,然后调整并利用范围树进行查询。我们实现两种类型的范围树:基本的二维范围树(2D-RT)和带有分数级联的扩展范围树(RTFC),并将它们与常规间隔树(IT)进行比较。理论分析表明,对于三棵树之间所有艾伦关系的区间查询,RTFC可以实现最佳的时间复杂度。我们还对RTFC,2D-RT和IT在查询大量个人基因组中的非编码元素注释时的效率进行了比较实验。我们的实验结果表明,对于大多数艾伦关系的间隔查询,2D-RT比IT更有效,RTFC比2D-RT更高效。结果表明,RTFC是一种有效的数据结构,可用于查询有关基因组间隔之间的艾伦关系的大规模数据集,例如解释大型人群中全基因组变异所需的那些数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号