LightGBM: An Effective and Scalable Algorithm for Prediction of Chemical Toxicity-Application to the Tox21 and Mutagenicity Data Sets

Zhang Jin; Mucs Daniel; Norinder Ulf; Svensson Fredrik

首页> 外文期刊>Journal of chemical information and modeling >LightGBM: An Effective and Scalable Algorithm for Prediction of Chemical Toxicity-Application to the Tox21 and Mutagenicity Data Sets

【24h】

LightGBM: An Effective and Scalable Algorithm for Prediction of Chemical Toxicity-Application to the Tox21 and Mutagenicity Data Sets

机译：LightGBM：一种有效且可扩展的算法，用于预测化学毒性 - 应用于TOX21和致突变性数据集

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Machine learning algorithms have attained widespread use in assessing the potential toxicities of pharmaceuticals and industrial chemicals because of their faster speed and lower cost compared to experimental bioassays. Gradient boosting is an effective algorithm that often achieves high predictivity, but historically the relative long computational time limited its applications in predicting large compound libraries or developing in silico predictive models that require frequent retraining. LightGBM, a recent improvement of the gradient boosting algorithm, inherited its high predictivity but resolved its scalability and long computational time by adopting a leaf-wise tree growth strategy and introducing novel techniques. In this study, we compared the predictive performance and the computational time of LightGBM to deep neural networks, random forests, support vector machines, and XGBoost. All algorithms were rigorously evaluated on publicly available Tox21 and mutagenicity data sets using a Bayesian optimization integrated nested 10-fold cross-validation scheme that performs hyperparameter optimization while examining model generalizability and transferability to new data. The evaluation results demonstrated that LightGBM is an effective and highly scalable algorithm offering the best predictive performance while consuming significantly shorter computational time than the other investigated algorithms across all Tox21 and mutagenicity data sets. We recommend LightGBM for applications of in silico safety assessment and also other areas of cheminformatics to fulfill the ever-growing demand for accurate and rapid prediction of various toxicity or activity related end points of large compound libraries present in the pharmaceutical and chemical industry.

机译：机器学习算法已经过广泛用于评估药品和工业化学品的潜在毒性，因为与实验性生物测量相比，其速度更快，成本更高。梯度升压是一种有效的算法，它通常实现高预测性，但历史上相对长的计算时间限制了其在预测需要频繁再培训的硅预测模型中的应用程序或在硅预测模型中开发的应用。最近梯度升压算法的最近改进了LightGBM，继承了其高预测性，而是通过采用叶明树增长策略和引入新技术来解决其可扩展性和长的计算时间。在这项研究中，我们将LightGBM的预测性能和计算时间与深神经网络，随机林，支持向量机和XGBoost进行了比较。使用贝叶斯优化集成嵌套10倍交叉验证方案对公共可用的TOX21和突变数据集进行严格评估所有算法，同时交叉验证方案在检查模型概括性和对新数据的可转换性的同时执行超参数优化。评估结果表明，LightGBM是一种有效且高度可扩展的算法，其提供最佳的预测性能，同时消耗除了所有TOX21和突变性数据集的其他研究算法的计算时间明显较短。我们建议在Silico安全评估中的应用以及其他化学信息学的其他领域，以满足对药物和化学工业中存在的大型复合文库的各种毒性或活性相关终点的需求不断增长的需求。

著录项

来源
《Journal of chemical information and modeling》 |2019年第10期|共9页
作者
Zhang Jin; Mucs Daniel; Norinder Ulf; Svensson Fredrik;
展开▼
作者单位

Umea Univ Dept Chem S-90187 Umea Sweden;

Karolinska Inst Unit Toxicol Sci Swetox Forskargatan 20 SE-15136 Sodertalje Sweden;

Karolinska Inst Unit Toxicol Sci Swetox Forskargatan 20 SE-15136 Sodertalje Sweden;

UCL Drug Discovery Inst Alzheimers Res UK Cruciform Bldg Gower St London WC1E 6BT England;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类化学;化学工业;
关键词

相似文献

外文文献
中文文献
专利

1. 应用于分级网络的可扩展拓扑聚集算法 [J] . 罗勇军, 白英彩东南大学学报（英文版） . 2003,第004期
2. LightGBM: An Effective and Scalable Algorithm for Prediction of Chemical Toxicity-Application to the Tox21 and Mutagenicity Data Sets [J] . Zhang Jin, Mucs Daniel, Norinder Ulf, Journal of chemical information and modeling . 2019,第10期

机译：LightGBM：一种有效且可扩展的算法，用于预测化学毒性 - 应用于TOX21和致突变性数据集
3. Prediction of Mutagenicity of Chemicals from Their Calculated Molecular Descriptors: A Case Study with Structurally Homogeneous versus Diverse Datasets [J] . Basak Subhash C., Majumdar Subhabrata Current computer-aided drug design . 2015,第2期

机译：从其计算的分子描述符预测化学物质的致突变性：结构同质性与多样化数据集的案例研究
4. Benchmark Data Set for in Silico Prediction of Ames Mutagenicity [J] . Hansen K, Mika S, Schroeter T, Journal of chemical information and modeling . 2009,第9期

机译：用于埃姆斯致突变性的计算机模拟的基准数据集
5. Support vector machines in the prediction of mutagenicity of chemical compounds [C] . Ferrari T., Gini G., Benfenati E. Fuzzy Information Processing Society, 2009. NAFIPS 2009 . 2009

机译：支持向量机预测化合物的致突变性
6. Data structures and algorithms for partitioning a set into sets of non-descending cardinality. [D] . Titti, Oshani. 2016

机译：用于将一组划分为一组非降序基数的数据结构和算法。
7. Prediction of Protein–ATP Binding Residues Based on Ensemble of Deep Convolutional Neural Networks and LightGBM Algorithm [O] . Jiazhi Song, Guixia Liu, Jingqing Jiang, 2021

机译：基于深卷积神经网络和LightGBM算法集合的蛋白质-ATP结合残基预测
8. LightGBM: An Effective and Scalable Algorithm for Prediction of Chemical Toxicity–Application to the Tox21 and Mutagenicity Data Sets [O] . Jin Zhang, Daniel Mucs, Ulf Norinder, 2019

机译：LightGBM：一种有效且可扩展的算法，用于预测化学毒性 - 应用于TOX21和致突变性数据集

LightGBM: An Effective and Scalable Algorithm for Prediction of Chemical Toxicity-Application to the Tox21 and Mutagenicity Data Sets

摘要

著录项

相似文献

相关主题

期刊订阅