首页> 外文期刊>BMC Bioinformatics >miRFam: an effective automatic miRNA classification method based on n-grams and a multiclass SVM
【24h】

miRFam: an effective automatic miRNA classification method based on n-grams and a multiclass SVM

机译:miRFam:一种基于n-gram和多类SVM的有效的自动miRNA分类方法

获取原文
           

摘要

Background MicroRNAs (miRNAs) are ~22 nt long integral elements responsible for post-transcriptional control of gene expressions. After the identification of thousands of miRNAs, the challenge is now to explore their specific biological functions. To this end, it will be greatly helpful to construct a reasonable organization of these miRNAs according to their homologous relationships. Given an established miRNA family system (e.g. the miRBase family organization), this paper addresses the problem of automatically and accurately classifying newly found miRNAs to their corresponding families by supervised learning techniques. Concretely, we propose an effective method, miRFam, which uses only primary information of pre-miRNAs or mature miRNAs and a multiclass SVM, to automatically classify miRNA genes. Results An existing miRNA family system prepared by miRBase was downloaded online. We first employed n-grams to extract features from known precursor sequences, and then trained a multiclass SVM classifier to classify new miRNAs (i.e. their families are unknown). Comparing with miRBase's sequence alignment and manual modification, our study shows that the application of machine learning techniques to miRNA family classification is a general and more effective approach. When the testing dataset contains more than 300 families (each of which holds no less than 5 members), the classification accuracy is around 98%. Even with the entire miRBase15 (1056 families and more than 650 of them hold less than 5 samples), the accuracy surprisingly reaches 90%. Conclusions Based on experimental results, we argue that miRFam is suitable for application as an automated method of family classification, and it is an important supplementary tool to the existing alignment-based small non-coding RNA (sncRNA) classification methods, since it only requires primary sequence information. Availability The source code of miRFam, written in C++, is freely and publicly available at: http://admis.fudan.edu.cn/projects/miRFam.htm webcite .
机译:背景MicroRNA(miRNA)是〜22 nt长的整体元件,负责基因表达的转录后控制。在鉴定了成千上万的miRNA之后,现在的挑战是探索其特定的生物学功能。为此,根据它们的同源关系构建这些miRNA的合理组织将大有帮助。给定已建立的miRNA家族系统(例如miRBase家族组织),本文探讨了通过监督学习技术将新发现的miRNA自动准确分类到其相应家族的问题。具体而言,我们提出了一种有效的方法miRFam,该方法仅使用pre-miRNA或成熟miRNA的主要信息以及多类SVM来自动分类miRNA基因。结果在线下载了miRBase制备的现有miRNA家族系统。我们首先使用n-gram从已知的前体序列中提取特征,然后训练多类SVM分类器对新的miRNA进行分类(即它们的家族未知)。与miRBase的序列比对和手动修改相比,我们的研究表明,将机器学习技术应用于miRNA家族分类是一种通用且更有效的方法。当测试数据集包含300个以上的族(每个族包含不少于5个成员)时,分类准确性约为98%。即使使用整个miRBase15(1056个家族,其中有650个以上的家族持有不到5个样品),其准确性也令人惊讶地达到90%。结论基于实验结果,我们认为miRFam适合作为家庭分类的自动化方法,它是现有基于比对的小型非编码RNA(sncRNA)分类方法的重要补充工具,因为它仅需要主序列信息。可获得性用C ++编写的miRFam的源代码可从以下网址免费获得和公开获得:http://admis.fudan.edu.cn/projects/miRFam.htm webcite。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号