Effect of training data size and noise level on support vector machines virtual screening of genotoxic compounds from large compound libraries

Kumar Pankaj; Ma XH; Liu XH; Jia J; Han BC; Xue Y; Li ZR; Yang SY; Wei YQ; Chen YZ 10

首页> 外文期刊>Journal of Computer-Aided Molecular Design >Effect of training data size and noise level on support vector machines virtual screening of genotoxic compounds from large compound libraries

【24h】

Effect of training data size and noise level on support vector machines virtual screening of genotoxic compounds from large compound libraries

机译：训练数据大小和噪声水平对支持向量机从大型化合物库中虚拟筛选遗传毒性化合物的影响

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Various in vitro and in-silico methods have been used for drug genotoxicity tests, which show limited genotoxicity (GT+) and non-genotoxicity (GT-) identification rates. New methods and combinatorial approaches have been explored for enhanced collective identification capability. The rates of in-silco methods may be further improved by significantly diversified training data enriched by the large number of recently reported GT+ and GT- compounds, but a major concern is the increased noise levels arising from high false-positive rates of in vitro data. In this work, we evaluated the effect of training data size and noise level on the performance of support vector machines (SVM) method known to tolerate high noise levels in training data. Two SVMs of different diversityoise levels were developed and tested. H-SVM trained by higher diversity higher noise data (GT+ in any in vivo or in vitro test) outperforms L-SVM trained by lower noise lower diversity data (GT+ in in vivo or Ames test only). H-SVM trained by 4,763 GT+ compounds reported before 2008 and 8,232 GT- compounds excluding clinical trial drugs correctly identified 81.6% of the 38 GT+ compounds reported since 2008, predicted 83.1% of the 2,008 clinical trial drugs as GT-, and 23.96% of 168 K MDDR and 27.23% of 17.86M PubChem compounds as GT+. These are comparable to the 43.1-51.9% GT+ and 75-93% GT- rates of existing in-silico methods, 58.8% GT+ and 79% GT- rates of Ames method, and the estimated percentages of 23% in vivo and 31-33% in vitro GT+ compounds in the "universe of chemicals". There is a substantial level of agreement between H-SVM and L-SVM predicted GT+ and GT- MDDR compounds and the prediction from TOPKAT. SVM showed good potential in identifying GT+ compounds from large compound libraries based on higher diversity and higher noise training data.

机译：各种体外和计算机模拟方法已用于药物遗传毒性测试，这些方法显示出有限的遗传毒性（GT +）和非遗传毒性（GT-）识别率。为了增强集体识别能力，已经探索了新的方法和组合方法。大量多样的最新报道的GT +和GT-化合物丰富了大量多样化的训练数据，可以进一步提高硅胶法的使用率，但主要的担忧是，由于体外数据的假阳性率高，导致噪声水平升高。在这项工作中，我们评估了训练数据大小和噪声水平对支持向量机（SVM）方法性能的影响，该方法已知可以承受训练数据中的高噪声水平。开发并测试了两个具有不同多样性/噪声水平的SVM。通过较高分集的较高噪声数据训练的H-SVM（在任何体内或体外试验中均通过GT +训练）优于通过较低噪声的低多样性数据训练的L-SVM（仅在体内或Ames试验中进行GT +）训练。由2008年之前报告的4,763种GT +化合物和不包括临床试验药物的8,232种GT-化合物训练的H-SVM正确识别了自2008年以来报告的38种GT +化合物中的81.6％，预测2,008种临床试验药物中的83.1％为GT-和23.96％ 168 K MDDR和17.86M PubChem化合物中的27.23％为GT +。这些可与现有计算机模拟方法中的GT。和GT-比率分别为43.1-51.9％和75-93％，Ames方法分别为58.8％和79％的估计比率，体内23％的估计百分比和31- “化学宇宙”中33％的体外GT +化合物。 H-SVM和L-SVM预测的GT +和GT-MDDR化合物与TOPKAT的预测之间存在相当程度的共识。 SVM在较高的多样性和较高的噪声训练数据的基础上，具有从大型化合物库中鉴定GT +化合物的潜力。

著录项

来源
《Journal of Computer-Aided Molecular Design》 |2011年第5期|共13页
作者
Kumar Pankaj; Ma XH; Liu XH; Jia J; Han BC; Xue Y; Li ZR; Yang SY; Wei YQ; Chen YZ 10;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类电子计算机在化学中的应用;
关键词
Bioinformatics; Genotoxicity; Computer aided drug design; Drug safety; Genotoxicity; Machine learning; Statistical learning; Support vector machine;

机译：生物信息学;遗传毒性;计算机辅助药物设计;药物安全性;遗传毒性;机器学习;统计学习;支持向量机;

相似文献

外文文献
中文文献
专利

1. Effect of training data size and noise level on support vector machines virtual screening of genotoxic compounds from large compound libraries [J] . Kumar Pankaj, Ma XH, Liu XH, Journal of Computer-Aided Molecular Design . 2011,第5期

机译：训练数据大小和噪声水平对支持向量机从大型化合物库中虚拟筛选遗传毒性化合物的影响
2. Combinatorial support vector machines approach for virtual screening of selective multi-target serotonin reuptake inhibitors from large compound libraries [J] . Shi Z., Ma X.H., Qin C., Journal of molecular graphics & modelling . 2012,第Null期

机译：组合支持向量机方法可从大型化合物库中虚拟筛选选择性多靶点5-羟色胺再摄取抑制剂
3. Development and experimental test of support vector machines virtual screening method for searching Src inhibitors from large compound libraries [J] . Bucong Han, Xiaohua Ma, Ruiying Zhao, Chemistry central journal . 2012,第1期

机译：支持向量机虚拟筛选方法从大型化合物库中搜索Src抑制剂的开发和实验测试
4. Structure-based Virtual Screening of Compound Library for Anti-estrogen Breast Cancer Candidates [C] . Xiaoyan Li, Xue Liu, Chaofeng Du International Conference on Biotechnology, Chemical and Materials Engineering . 2014

机译：基于结构的抗雌激素乳腺癌候选文库的虚拟筛选
5. The Screening of One-Bead-One-Compound (OBOC) Small Molecule Libraries against Phage Display Libraries -The Development of a Novel Multiplex Screening Approach and its Applications. [D] . Wu, Chun-Yi. 2010

机译：针对噬菌体展示库的一珠一化合物（OBOC）小分子文库的筛选-一种新型多重筛选方法的开发及其应用。
6. Development and experimental test of support vector machines virtual screening method for searching Src inhibitors from large compound libraries [O] . Bucong Han, Xiaohua Ma, Ruiying Zhao, 2012

机译：支持向量机虚拟筛选方法从大型化合物库中搜索Src抑制剂的开发和实验测试
7. Development and experimental test of support vector machines virtual screening method for searching Src inhibitors from large compound libraries [O] . Han Bucong, Ma Xiaohua, Zhao Ruiying, 2012

机译：支持向量机虚拟筛选方法从大型化合物库中搜索Src抑制剂的开发和实验测试

Effect of training data size and noise level on support vector machines virtual screening of genotoxic compounds from large compound libraries

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅