首页> 美国卫生研究院文献>Nucleic Acids Research >Repeat or not repeat?—Statistical validation of tandem repeat prediction in genomic sequences
【2h】

Repeat or not repeat?—Statistical validation of tandem repeat prediction in genomic sequences

机译:重复还是不重复?—基因组序列中串联重复预测的统计验证

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Tandem repeats (TRs) represent one of the most prevalent features of genomic sequences. Due to their abundance and functional significance, a plethora of detection tools has been devised over the last two decades. Despite the longstanding interest, TR detection is still not resolved. Our large-scale tests reveal that current detectors produce different, often nonoverlapping inferences, reflecting characteristics of the underlying algorithms rather than the true distribution of TRs in genomic data. Our simulations show that the power of detecting TRs depends on the degree of their divergence, and repeat characteristics such as the length of the minimal repeat unit and their number in tandem. To reconcile the diverse predictions of current algorithms, we propose and evaluate several statistical criteria for measuring the quality of predicted repeat units. In particular, we propose a model-based phylogenetic classifier, entailing a maximum-likelihood estimation of the repeat divergence. Applied in conjunction with the state of the art detectors, our statistical classification scheme for inferred repeats allows to filter out false-positive predictions. Since different algorithms appear to specialize at predicting TRs with certain properties, we advise applying multiple detectors with subsequent filtering to obtain the most complete set of genuine repeats.
机译:串联重复(TR)代表基因组序列最普遍的特征之一。由于它们的丰富性和功能意义,在过去的二十年中,已经设计了许多检测工具。尽管长期以来一直关注,TR检测仍未解决。我们的大规模测试表明,当前的检测器会产生不同的,通常不重叠的推论,反映出基础算法的特征,而不是基因组数据中TR的真实分布。我们的仿真表明,检测TR的能力取决于它们的发散程度以及重复特性,例如最小重复单元的长度及其串联数。为了调和当前算法的各种预测,我们提出并评估了几种统计标准,用于测量预测重复单元的质量。特别是,我们提出了一个基于模型的系统发育分类器,要求对重复发散度进行最大似然估计。与最新的检测器结合使用,我们的推断重复统计分类方案可以过滤掉假阳性预测。由于不同的算法似乎擅长于预测具有某些属性的TR,因此我们建议应用多个检测器并进行后续滤波,以获得最完整的真实重复序列。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号