首页> 美国卫生研究院文献>Genome Research >ESPERR: Learning strong and weak signals in genomic sequence alignments to identify functional elements
【2h】

ESPERR: Learning strong and weak signals in genomic sequence alignments to identify functional elements

机译:ESPERR:学习基因组序列比对中的强信号和弱信号以识别功能元件

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Genomic sequence signals—such as base composition, presence of particular motifs, or evolutionary constraint—have been used effectively to identify functional elements. However, approaches based only on specific signals known to correlate with function can be quite limiting. When training data are available, application of computational learning algorithms to multispecies alignments has the potential to capture broader and more informative sequence and evolutionary patterns that better characterize a class of elements. However, effective exploitation of patterns in multispecies alignments is impeded by the vast number of possible alignment columns and by a limited understanding of which particular strings of columns may characterize a given class. We have developed a computational method, called ESPERR (evolutionary and sequence pattern extraction through reduced representations), which uses training examples to learn encodings of multispecies alignments into reduced forms tailored for the prediction of chosen classes of functional elements. ESPERR produces a greatly improved Regulatory Potential score, which can discriminate regulatory regions from neutral sites with excellent accuracy (∼94%). This score captures strong signals (GC content and conservation), as well as subtler signals (with small contributions from many different alignment patterns) that characterize the regulatory elements in our training set. ESPERR is also effective for predicting other classes of functional elements, as we show for DNaseI hypersensitive sites and highly conserved regions with developmental enhancer activity. Our software, training data, and genome-wide predictions are available from our Web site ().
机译:基因组序列信号(例如碱基组成,特定基序的存在或进化限制)已被有效地用于识别功能元件。但是,仅基于已知与功能相关的特定信号的方法可能会受到很大限制。当训练数据可用时,将计算学习算法应用于多物种比对有可能捕获更广泛,更有用的序列和进化模式,从而更好地表征一类元素。但是,由于大量可能的比对列以及对哪些特定的列字符串可以表征给定类别的限制,阻碍了多物种比对中模式的有效利用。我们已经开发了一种称为ESPERR(通过简化表示的进化和序列模式提取)的计算方法,该方法使用训练示例来学习将多物种比对编码为简化形式,以简化预测功能元素的选择。 ESPERR产生了大大提高的调节电位评分,可以以极高的准确度(〜94%)将调节区域与中性位点区分开。该分数捕获了强烈的信号(GC含量和保守度)以及微妙的信号(来自许多不同的比对模式的少量贡献),这些信号表征了我们训练集中的调节元素。 ESPERR还可以有效预测其他类别的功能元件,如DNaseI超敏感位点和具有发育增强子活性的高度保守区域所示。可从我们的网站()获得我们的软件,培训数据和全基因组预测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号