首页> 外文期刊>Nanotoxicology >Accurate and interpretable nanoSAR models from genetic programming-based decision tree construction approaches
【24h】

Accurate and interpretable nanoSAR models from genetic programming-based decision tree construction approaches

机译:基于基于遗传程序的决策树构建方法的准确且可解释的nanoSAR模型

获取原文
获取原文并翻译 | 示例
           

摘要

The number of engineered nanomaterials (ENMs) being exploited commercially is growing rapidly, due to the novel properties they exhibit. Clearly, it is important to understand and minimize any risks to health or the environment posed by the presence of ENMs. Data-driven models that decode the relationships between the biological activities of ENMs and their physicochemical characteristics provide an attractive means of maximizing the value of scarce and expensive experimental data. Although such structure-activity relationship (SAR) methods have become very useful tools for modelling nanotoxicity endpoints (nanoSAR), they have limited robustness and predictivity and, most importantly, interpretation of the models they generate is often very difficult. New computational modelling tools or new ways of using existing tools are required to model the relatively sparse and sometimes lower quality data on the biological effects of ENMs. The most commonly used SAR modelling methods work best with large datasets, are not particularly good at feature selection, can be relatively opaque to interpretation, and may not account for nonlinearity in the structure-property relationships. To overcome these limitations, we describe the application of a novel algorithm, a genetic programming-based decision tree construction tool (GPTree) to nanoSAR modelling. We demonstrate the use of GPTree in the construction of accurate and interpretable nanoSAR models by applying it to four diverse literature datasets. We describe the algorithm and compare model results across the four studies. We show that GPTree generates models with accuracies equivalent to or superior to those of prior modelling studies on the same datasets. GPTree is a robust, automatic method for generation of accurate nanoSAR models with important advantages that it works with small datasets, automatically selects descriptors, and provides significantly improved interpretability of models.
机译:由于其表现出的新颖特性,被商业开发的工程化纳米材料(ENM)的数量正在迅速增长。显然,重要的是要了解并最大程度减少ENM的存在对健康或环境造成的任何风险。数据驱动的模型可以解码ENM的生物学活性与其理化特性之间的关系,从而为使稀缺和昂贵的实验数据的价值最大化提供了一种有吸引力的手段。尽管此类结构-活性关系(SAR)方法已成为用于建模纳米毒性终点(nanoSAR)的非常有用的工具,但它们的鲁棒性和可预测性有限,最重要的是,对其生成的模型进行解释通常非常困难。需要新的计算建模工具或使用现有工具的新方法来对ENM的生物学效应进行相对稀疏且有时质量较低的数据建模。最常用的SAR建模方法最适合大型数据集,不是特别擅长特征选择,对于解释而言可能相对不透明,并且可能无法解释结构-属性关系中的非线性。为了克服这些限制,我们描述了一种新颖的算法,一种基于遗传编程的决策树构建工具(GPTree)在nanoSAR建模中的应用。通过将GPTree应用于四个不同的文献数据集,我们演示了GPTree在构建准确且可解释的nanoSAR模型中的用途。我们描述了算法并比较了这四项研究的模型结果。我们显示GPTree生成的模型的准确度等于或优于相同数据集上的先前建模研究的准确度。 GPTree是一种健壮的自动方法,用于生成精确的nanoSAR模型,具有重要的优势,它可以处理小型数据集,自动选择描述符并显着提高了模型的可解释性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号