首页> 美国卫生研究院文献>other >A Machine Learning Approach for Accurate Annotation of Noncoding RNAs
【2h】

A Machine Learning Approach for Accurate Annotation of Noncoding RNAs

机译:准确标注非编码RNA的机器学习方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Searching genomes to locate noncoding RNA genes with known secondary structure is an important problem in bioinformatics. In general, the secondary structure of a searched noncoding RNA is defined with a structure model constructed from the structural alignment of a set of sequences from its family. Computing the optimal alignment between a sequence and a structure model is the core part of an algorithm that can search genomes for noncoding RNAs. In practice, a single structure model may not be sufficient to capture all crucial features important for a noncoding RNA family. In this paper, we develop a novel machine learning approach that can efficiently search genomes for noncoding RNAs with high accuracy. During the search procedure, a sequence segment in the searched genome sequence is processed and a feature vector is extracted to represent it. Based on the feature vector, a classifier is used to determine whether the sequence segment is the searched ncRNA or not. Our testing results show that this approach is able to efficiently capture crucial features of a noncoding RNA family. Compared with existing search tools, it significantly improves the accuracy of genome annotation.
机译:搜索基因组以定位具有已知二级结构的非编码RNA基因是生物信息学中的重要问题。通常,用结构模型定义搜索的非编码RNA的二级结构,该结构模型由来自其家族的一组序列的结构比对构建。计算序列和结构模型之间的最佳比对是算法的核心部分,该算法可以在基因组中搜索非编码RNA。实际上,单一结构模型可能不足以捕获对于非编码RNA家族重要的所有关键特征。在本文中,我们开发了一种新颖的机器学习方法,该方法可以高效地以高精度搜索基因组中的非编码RNA。在搜索过程中,对搜索到的基因组序列中的序列片段进行处理,并提取特征向量来表示它。基于特征向量,使用分类器来确定序列片段是否是所搜索的ncRNA。我们的测试结果表明,这种方法能够有效捕获非编码RNA家族的关键特征。与现有的搜索工具相比,它大大提高了基因组注释的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号