首页> 美国卫生研究院文献>Nucleic Acids Research >Limitations and potentials of current motif discovery algorithms
【2h】

Limitations and potentials of current motif discovery algorithms

机译:当前主题发现算法的局限性和潜力

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Computational methods for de novo identification of gene regulation elements, such as transcription factor binding sites, have proved to be useful for deciphering genetic regulatory networks. However, despite the availability of a large number of algorithms, their strengths and weaknesses are not sufficiently understood. Here, we designed a comprehensive set of performance measures and benchmarked five modern sequence-based motif discovery algorithms using large datasets generated from Escherichia coli RegulonDB. Factors that affect the prediction accuracy, scalability and reliability are characterized. It is revealed that the nucleotide and the binding site level accuracy are very low, while the motif level accuracy is relatively high, which indicates that the algorithms can usually capture at least one correct motif in an input sequence. To exploit diverse predictions from multiple runs of one or more algorithms, a consensus ensemble algorithm has been developed, which achieved 6–45% improvement over the base algorithms by increasing both the sensitivity and specificity. Our study illustrates limitations and potentials of existing sequence-based motif discovery algorithms. Taking advantage of the revealed potentials, several promising directions for further improvements are discussed. Since the sequence-based algorithms are the baseline of most of the modern motif discovery algorithms, this paper suggests substantial improvements would be possible for them.
机译:从头开始鉴定基因调控元件(例如转录因子结合位点)的计算方法已被证明可用于解密遗传调控网络。但是,尽管有大量算法可用,但对它们的优缺点却仍未充分了解。在这里,我们使用从大肠杆菌RegulonDB生成的大型数据集设计了一套全面的性能指标,并对五种基于现代序列的基序发现算法进行了基准测试。表征影响预测准确性,可伸缩性和可靠性的因素。揭示了核苷酸和结合位点水平的准确性非常低,而基序水平的准确性相对较高,这表明该算法通常可以在输入序列中捕获至少一个正确的基序。为了利用一个或多个算法的多次运行来利用各种预测,已经开发了一种共识集成算法,该算法通过提高灵敏度和特异性,比基本算法提高了6–45%。我们的研究说明了现有的基于序列的基序发现算法的局限性和潜力。利用所揭示的潜力,讨论了进一步改进的几个有希望的方向。由于基于序列的算法是大多数现代主题发现算法的基础,因此本文建议对其进行实质性改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号