Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for pre-microRNA Detection

机译：提高阳性数据集的质量以建立用于micro-microRNA前检测的机器学习模型

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

MicroRNAs (miRNAs) are involved in the post-transcriptional regulation of protein abundance and thus have a great impact on the resulting phenotype. It is, therefore, no wonder that they have been implicated in many diseases ranging from virus infections to cancer. This impact on the phenotype leads to a great interest in establishing the miRNAs of an organism. Experimental methods are complicated which led to the development of computational methods for pre-miRNA detection. Such methods generally employ machine learning to establish models for the discrimination between miRNAs and other sequences. Positive training data for model establishment, for the most part, stems from miRBase, the miRNA registry. The quality of the entries in miRBase has been questioned, though. This unknown quality led to the development of filtering strategies in attempts to produce high quality positive datasets which can lead to a scarcity of positive data. To analyze the quality of filtered data we developed a machine learning model and found it is well able to establish data quality based on intrinsic measures. Additionally, we analyzed which features describing pre-miRNAs could discriminate between low and high quality data. Both models are applicable to data from miRBase and can be used for establishing high quality positive data. This will facilitate the development of better miRNA detection tools which will make the prediction of miRNAs in disease states more accurate. Finally, we applied both models to all miRBase data and provide the list of high quality hairpins.

机译：MicroRNA（miRNA）参与蛋白质丰度的转录后调节，因此对所得表型有很大影响。因此，难怪它们与许多疾病有关，从病毒感染到癌症。对表型的这种影响导致人们对建立生物体的miRNA产生极大兴趣。实验方法很复杂，导致了miRNA前体检测的计算方法的发展。此类方法通常采用机器学习来建立miRNA与其他序列之间区别的模型。建立模型的积极训练数据大部分来自miRNA注册中心miRBase。但是，miRBase中条目的质量受到质疑。这种未知的质量导致过滤策略的发展，试图产生高质量的正数据集，这可能导致正数据的稀缺。为了分析过滤后的数据的质量，我们开发了一种机器学习模型，发现该模型能够很好地基于内在指标来确定数据质量。此外，我们分析了描述pre-miRNA的哪些功能可以区分低质量数据和高质量数据。两种模型均适用于来自miRBase的数据，并可用于建立高质量的阳性数据。这将有助于开发更好的miRNA检测工具，从而使疾病状态下的miRNA预测更加准确。最后，我们将两个模型都应用于所有miRBase数据，并提供了高质量发夹的列表。

著录项

期刊名称 Journal of Integrative Bioinformatics
作者
Müşerref Duygu Saçar Demirci; Jens Allmer;
展开▼
作者单位

展开▼
年(卷),期 2017(14),2
年度 2017
页码 20170032
总页数 11
原文格式 PDF
正文语种
中图分类生物学;
关键词
microRNA machine learning confidence high quality positive data miRBase MirGeneDB;

机译：microRNA;机器学习;信心;高质量;阳性数据;miRBase;MirGeneDB;

相似文献

外文文献
中文文献
专利

1. Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for pre-microRNA Detection [J] . Mü?erref Duygu Sa?ar Demirci, Jens Allmer Journal of Integrative Bioinformatics . 2017,第2期

机译：改善阳性数据集的质量，以建立用于micromicroRNA前检测的机器学习模型
2. Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for pre-microRNA Detection [J] . Mü?erref Duygu Sa?ar Demirci, Jens Allmer Journal of Integrative Bioinformatics . 2017,第2期

机译：改善阳性数据集的质量，以建立用于micromicroRNA前检测的机器学习模型
3. Application of Machine Learning Models in Error and Variant Detection in High-Variation Genomics Datasets ?￠???? [J] . Maria Nisheva, Milko Krachunov, Dimitar Vassilev Computers . 2017,第4期

机译：机器学习模型在高级基因组数据集的误差和变异检测中的应用
4. A transfer learning approach to improve object detection (on document-images) performance in presence of poor quality datasets [C] . Perumadura De Silva, Kolli Abhiram, Al-Sayeed Mohamad, International FLINS Conference . 2020

机译：一种转移学习方法，可在存在质量较差的数据集的情况下提高对象检测（在文档图像上）的性能
5. Machine Learning for the Analysis of Power System Loads: Cyber-Attack Detection and Generation of Synthetic Datasets [D] . Pinceti, Andrea. 2021

机译：电力系统负载分析的机器学习：网络攻击检测和合成数据集的产生
6. Improving the Performance of Machine Learning-Based Network Intrusion Detection Systems on the UNSW-NB15 Dataset [O] . Soulaiman Moualla, Khaldoun Khorzom, Assef Jafar 2021

机译：在UNSW-NB15数据集上提高基于机器学习的网络入侵检测系统的性能
7. Establishment of Prediction Models for Lung Cancer NOG/PDX Models: A Guideline for Machine Learning in Small Biomedical Datasets [O] . Haoyue Guo, Li Diao, Hui Qi, 2020

机译：建立肺癌NOG / PDX型号预测模型：小型生物医学数据集的机器学习指导

Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for pre-microRNA Detection

摘要

著录项

相似文献

相关主题

期刊订阅