首页> 外文OA文献 >Poly(A) motif prediction using spectral latent features from human DNA sequences
【2h】

Poly(A) motif prediction using spectral latent features from human DNA sequences

机译:使用人类DNA序列的潜在光谱特征预测Poly(A)主题

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Motivation: Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA.Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.Results: We propose a novel machine-learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we used hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine-tune the classification performance.We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14 740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of the previous state-of-the-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false-negative rate and false-positive rate by 26, 15 and 35%, respectively. Meanwhile, our method makes ?30% fewer error predictions relative to the other string kernels. Furthermore, our method can be used to visualize the importance of oligomers and positions in predicting poly(A) motifs, from which we can observe a number of characteristics in the surrounding regions of true and false motifs that have not been reported before. The Author 2013.
机译:动机:聚腺苷酸化是在RNA分子上添加poly(A)尾巴。识别发出信号信号的多聚腺苷酸尾序列的DNA序列基序对于改善基因组注释和更好地理解mRNA的调控机制和稳定性至关重要。 poly(A)主题可以在很大程度上区分真假主题。已经探索了各种复杂的功能,包括顺序,结构,统计,热力学和进化特性。但是,这些方法大多数都涉及大量的人工特征工程,这可能很耗时并且需要深入的领域知识。结果:我们提出了一种通过结合生成学习(隐藏)来预测poly(A)主题的新型机器学习方法马尔可夫模型)和判别学习(支持向量机)。生成学习提供了一个丰富的调色板,可以在该调色板上处理序列信息的不确定性和多样性,而判别学习则可以直接优化分类任务的性能。在这里,我们使用了隐马尔可夫模型来拟合DNA序列动力学,并开发了一种有效的光谱算法来从这些模型中提取潜在变量信息。然后将这些频谱潜在特征输入支持向量机,以微调分类性能。我们在一个综合的人类poly(A)数据集上评估了我们提出的方法,该数据集由来自12个人类poly(( A)图案。与文献中先前的最新技术之一(具有专家技术特征的随机森林模型)相比,我们的方法将平均错误率,假阴性率和假阳性率降低了26、15和35%分别。同时,相对于其他字符串内核,我们的方法使错误预测减少了大约30%。此外,我们的方法可用于可视化寡聚体和位置在预测poly(A)图案中的重要性,从中我们可以观察到以前未报道过的真假图案周围区域的许多特征。作者2013。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号