...
首页> 外文期刊>BMC Bioinformatics >A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model
【24h】

A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model

机译:使用模拟多尺度数据模型对化学毒性分类的机器学习算法的比较

获取原文
           

摘要

Background Bioactivity profiling using high-throughput in vitro assays can reduce the cost and time required for toxicological screening of environmental chemicals and can also reduce the need for animal testing. Several public efforts are aimed at discovering patterns or classifiers in high-dimensional bioactivity space that predict tissue, organ or whole animal toxicological endpoints. Supervised machine learning is a powerful approach to discover combinatorial relationships in complex in vitro/in vivo datasets. We present a novel model to simulate complex chemical-toxicology data sets and use this model to evaluate the relative performance of different machine learning (ML) methods. Results The classification performance of Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Na?ve Bayes (NB), Recursive Partitioning and Regression Trees (RPART), and Support Vector Machines (SVM) in the presence and absence of filter-based feature selection was analyzed using K-way cross-validation testing and independent validation on simulated in vitro assay data sets with varying levels of model complexity, number of irrelevant features and measurement noise. While the prediction accuracy of all ML methods decreased as non-causal (irrelevant) features were added, some ML methods performed better than others. In the limit of using a large number of features, ANN and SVM were always in the top performing set of methods while RPART and KNN (k = 5) were always in the poorest performing set. The addition of measurement noise and irrelevant features decreased the classification accuracy of all ML methods, with LDA suffering the greatest performance degradation. LDA performance is especially sensitive to the use of feature selection. Filter-based feature selection generally improved performance, most strikingly for LDA. Conclusion We have developed a novel simulation model to evaluate machine learning methods for the analysis of data sets in which in vitro bioassay data is being used to predict in vivo chemical toxicology. From our analysis, we can recommend that several ML methods, most notably SVM and ANN, are good candidates for use in real world applications in this area.
机译:背景技术使用高通量体外测定法进行生物活性分析可以减少对环境化学物质进行毒理学筛选所需的成本和时间,还可以减少对动物进行测试的需要。数项公共工作旨在发现高维生物活性空间中的模式或分类器,这些模式或分类器预测组织,器官或整个动物的毒理学终点。监督式机器学习是一种在复杂的体外/体内数据集中发现组合关系的有效方法。我们提出了一个新颖的模型来模拟复杂的化学毒理学数据集,并使用该模型来评估不同机器学习(ML)方法的相对性能。结果人工神经网络(ANN),K最近邻(KNN),线性判别分析(LDA),朴素贝叶斯(NB),递归分区和回归树(RPART)和支持向量机(SVM)的分类性能),在存在和不存在基于过滤器的特征的情况下,使用K-way交叉验证测试和对具有不同级别的模型复杂性,不相关特征的数量和测量噪声的模拟体外测定数据集的独立验证进行分析。尽管由于添加了非因果(无关)功能而降低了所有机器学习方法的预测准确性,但某些机器学习方法的性能要好于其他方法。在使用大量功能的限制中,ANN和SVM始终是性能最高的方法集,而RPART和KNN(k = 5)始终是性能最差的方法。测量噪声和不相关特征的添加降低了所有ML方法的分类准确性,其中LDA的性能下降幅度最大。 LDA性能对使用功能选择特别敏感。基于过滤器的功能选择通常可以提高性能,尤其是对于LDA而言。结论我们已经开发了一种新颖的仿真模型来评估机器学习方法,以分析数据集,其中使用体外生物测定数据来预测体内化学毒理学。从我们的分析中,我们可以建议几种ML方法,最著名的是SVM和ANN,是在该领域的实际应用中使用的良好候选者。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号