首页> 美国卫生研究院文献>Journal of Integrative Bioinformatics >Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for pre-microRNA Detection
【2h】

Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for pre-microRNA Detection

机译:提高阳性数据集的质量以建立用于micro-microRNA前检测的机器学习模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

MicroRNAs (miRNAs) are involved in the post-transcriptional regulation of protein abundance and thus have a great impact on the resulting phenotype. It is, therefore, no wonder that they have been implicated in many diseases ranging from virus infections to cancer. This impact on the phenotype leads to a great interest in establishing the miRNAs of an organism. Experimental methods are complicated which led to the development of computational methods for pre-miRNA detection. Such methods generally employ machine learning to establish models for the discrimination between miRNAs and other sequences. Positive training data for model establishment, for the most part, stems from miRBase, the miRNA registry. The quality of the entries in miRBase has been questioned, though. This unknown quality led to the development of filtering strategies in attempts to produce high quality positive datasets which can lead to a scarcity of positive data. To analyze the quality of filtered data we developed a machine learning model and found it is well able to establish data quality based on intrinsic measures. Additionally, we analyzed which features describing pre-miRNAs could discriminate between low and high quality data. Both models are applicable to data from miRBase and can be used for establishing high quality positive data. This will facilitate the development of better miRNA detection tools which will make the prediction of miRNAs in disease states more accurate. Finally, we applied both models to all miRBase data and provide the list of high quality hairpins.
机译:MicroRNA(miRNA)参与蛋白质丰度的转录后调节,因此对所得表型有很大影响。因此,难怪它们与许多疾病有关,从病毒感染到癌症。对表型的这种影响导致人们对建立生物体的miRNA产生极大兴趣。实验方法很复杂,导致了miRNA前体检测的计算方法的发展。此类方法通常采用机器学习来建立miRNA与其他序列之间区别的模型。建立模型的积极训练数据大部分来自miRNA注册中心miRBase。但是,miRBase中条目的质量受到质疑。这种未知的质量导致过滤策略的发展,试图产生高质量的正数据集,这可能导致正数据的稀缺。为了分析过滤后的数据的质量,我们开发了一种机器学习模型,发现该模型能够很好地基于内在指标来确定数据质量。此外,我们分析了描述pre-miRNA的哪些功能可以区分低质量数据和高质量数据。两种模型均适用于来自miRBase的数据,并可用于建立高质量的阳性数据。这将有助于开发更好的miRNA检测工具,从而使疾病状态下的miRNA预测更加准确。最后,我们将两个模型都应用于所有miRBase数据,并提供了高质量发夹的列表。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号