首页> 外文期刊>Journal of molecular graphics & modelling >In silico prediction of toxic action mechanisms of phenols for imbalanced data with random forest learner
【24h】

In silico prediction of toxic action mechanisms of phenols for imbalanced data with random forest learner

机译:随机森林学习者对不平衡数据进行酚类毒理作用机理的计算机模拟预测

获取原文
获取原文并翻译 | 示例
           

摘要

With an increasing need for the rapid and effective safety assessment of compounds in industrial and civil-use products, in silico toxicity exploration techniques provide an economic way for environmental hazard assessment. The previous in silico researches have developed many quantitative structure-activity relationships models to predict toxicity mechanisms for last decade. Most of these methods benefit from data analysis and machine learning techniques, which rely heavily on the characteristics of data sets. For Tetrahymena pyriformis toxicity data sets, there is a great technical challenge - data imbalance. The skewness of data class distribution would greatly deteriorate the prediction performance on rare classes. Most of the previous researches for phenol mechanisms of toxic action prediction did not consider this practical problem. In this work, we dealt with the problem by considering the difference between the two types of misclassifications. Random Forest learner was employed in cost-sensitive learning framework to construct prediction models based on selected molecular descriptors. In computational experiments, both the global and local models obtained appreciable overall prediction accuracies. Particularly, the performance on rare classes was indeed promoted. Moreover, for practical usage of these models, the balance of the two misclassifications can be adjusted by using different cost matrices according to the application goals.
机译:随着对快速有效的工业和民用产品中化合物安全性评估的需求,计算机毒性探索技术为环境危害评估提供了一种经济途径。先前的计算机研究已经开发出许多定量的结构-活性关系模型来预测最近十年的毒性机理。这些方法大多数都受益于数据分析和机器学习技术,这些技术严重依赖于数据集的特征。对于梨形四膜虫毒性数据集,存在巨大的技术挑战-数据不平衡。数据类别分布的偏斜将大大降低对稀有类别的预测性能。以前关于苯酚毒性作用预测机理的大多数研究都没有考虑到这一实际问题。在这项工作中,我们通过考虑两种类型的错误分类之间的差异来处理该问题。随机森林学习器用于成本敏感的学习框架中,以基于选定的分子描述符构建预测模型。在计算实验中,全局模型和局部模型均获得了可观的总体预测精度。特别是,确实提高了稀有班级的表现。此外,对于这些模型的实际使用,可以根据应用目标通过使用不同的成本矩阵来调整两个错误分类的平衡。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号