首页> 外文期刊>Algorithms for Molecular Biology >Learning from positive examples when the negative class is undetermined- microRNA gene identification
【24h】

Learning from positive examples when the negative class is undetermined- microRNA gene identification

机译:在不确定阴性类别时向阳性实例学习-microRNA基因鉴定

获取原文
       

摘要

Background The application of machine learning to classification problems that depend only on positive examples is gaining attention in the computational biology community. We and others have described the use of two-class machine learning to identify novel miRNAs. These methods require the generation of an artificial negative class. However, designation of the negative class can be problematic and if it is not properly done can affect the performance of the classifier dramatically and/or yield a biased estimate of performance. We present a study using one-class machine learning for microRNA (miRNA) discovery and compare one-class to two-class approaches using na?ve Bayes and Support Vector Machines. These results are compared to published two-class miRNA prediction approaches. We also examine the ability of the one-class and two-class techniques to identify miRNAs in newly sequenced species. Results Of all methods tested, we found that 2-class naive Bayes and Support Vector Machines gave the best accuracy using our selected features and optimally chosen negative examples. One class methods showed average accuracies of 70–80% versus 90% for the two 2-class methods on the same feature sets. However, some one-class methods outperform some recently published two-class approaches with different selected features. Using the EBV genome as and external validation of the method we found one-class machine learning to work as well as or better than a two-class approach in identifying true miRNAs as well as predicting new miRNAs. Conclusion One and two class methods can both give useful classification accuracies when the negative class is well characterized. The advantage of one class methods is that it eliminates guessing at the optimal features for the negative class when they are not well defined. In these cases one-class methods can be superior to two-class methods when the features which are chosen as representative of that positive class are well defined. Availability The OneClassmiRNA program is available at: [ 1 ]
机译:背景技术机器学习在仅依赖于正面示例的分类问题中的应用正受到计算生物学界的关注。我们和其他人描述了使用两类机器学习来识别新颖的miRNA。这些方法需要生成人工否定类。但是,否定类别的指定可能会出现问题,如果未正确执行,可能会严重影响分类器的性能和/或产生性能的偏差估计。我们提出一项使用一类机器学习进行microRNA(miRNA)发现的研究,并使用朴素贝叶斯和支持向量机将一类方法与两类方法进行比较。将这些结果与已发布的两类miRNA预测方法进行了比较。我们还研究了一类和两类技术在新测序物种中鉴定miRNA的能力。结果在所有测试的方法中,我们发现2类朴素贝叶斯和支持向量机使用我们选择的特征和最佳选择的反例提供了最佳准确性。一类方法显示的平均准确度为70-80%,而两个2类方法在同一特征集上的平均准确度为90%。但是,某些一类方法要优于最近发布的具有不同所选功能的两类方法。使用EBV基因组作为方法的外部验证,我们发现一类机器学习在识别真正的miRNA以及预测新的miRNA方面效果优于或优于两类方法。结论当否定类被很好地表征时,一类和二类方法都可以给出有用的分类精度。一类方法的优点是,它消除了未明确定义否定类的最佳特征的猜测。在这些情况下,如果很好地定义了代表该肯定类别的特征,则一类方法可能优于二类方法。可用性OneClassmiRNA程序可在以下位置获得:[1]

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号