...
首页> 外文期刊>Molecular diversity >Prediction of carcinogenicity for diverse chemicals based on substructure grouping and SVM modeling
【24h】

Prediction of carcinogenicity for diverse chemicals based on substructure grouping and SVM modeling

机译:基于子结构分组和SVM建模的多种化学物质的致癌性预测

获取原文
获取原文并翻译 | 示例
           

摘要

The Carcinogenicity Reliability Database (CRDB) was constructed by collecting experimental carcinogenicity data on about 1,500 chemicals from six sources, including IARC, and NTP databases, and then by ranking their reliabilities into six unified categories. A wide variety of 911 organic chemicals were selected from the database for QSAR modeling, and 1,504 kinds of different molecular descriptors were calculated, based on their 3D molecular structures as modeled by the Dragon software. Positive (carcinogenic) and negative (non-carcinogenic) chemicals containing various substructures were counted using atom and functional group count descriptors, and the statistical significance of ratios of positives to negatives was tested for those substructures. Very few were judged to be strongly related to carcinogenicity, among substructures known to be responsible for carcinogens as revealed from biomedical studies. In order to develop QSAR models for the prediction of the carcinogenicities of a wide variety of chemicals with a satisfactory performance level, the relationship between the carcinogenicity data with improved reliability and a subset of significant descriptors selected from 1,504 Dragon descriptors was analyzed with a support vector machine (SVM) method: the classification function (SVC) for weighted data in LIBSVM program was used to classify chemicals into two carcinogenic categories (positive or negative), where weights were set depending on the reliabilities of the carcinogenicity data. The quality and stability of the models presented were tested by performing a dual cross-validation procedure. A single SVM model as the first step was developed for all the 911 chemicals using 250 selected descriptors, achieving an overall accuracy level, i.e., positive and negative correct estimate, of about 70%. In order to improve the accuracy of the final model, the 911 chemicals were classified into 20 mutually overlapping subgroups according to contained substructures, a specific SVM model was optimized for each subgroup, and the predicted carcinogenicities of the 911 chemicals were determined by the majorities of the outputs of the corresponding SVM models. The model developed on the basis of grouping of chemicals into 20 substructures predicts the carcinogenicities of a wide variety of chemicals with a satisfactory overall accuracy of approximately 80%.
机译:致癌性可靠性数据库(CRDB)的构建是通过收集来自六个来源(包括IARC​​和NTP数据库)的约1,500种化学物质的实验致癌性数据,然后将其可靠性分为六个统一类别来进行的。从数据库中选择了多种911有机化学物质进行QSAR建模,并根据Dragon软件建模的3D分子结构,计算了1,504种不同的分子描述符。使用原子和官能团计数描述符对包含各种子结构的正(致癌)和负(非致癌)化学物质进行计数,并测试这些子结构的正负比率之统计意义。生物医学研究表明,在已知致癌物的亚结构中,很少有人被认为与致癌性密切相关。为了开发QSAR模型来预测性能令人满意的多种化学品的致癌性,使用支持向量分析了可靠性得到提高的致癌性数据与从1,504个Dragon描述符中选择的重要描述符子集之间的关系。机器(SVM)方法:使用LIBSVM程序中加权数据的分类函数(SVC)将化学品分类为两个致癌类别(正或负),其中权重取决于致癌性数据的可靠性。通过执行双重交叉验证程序,测试了所提供模型的质量和稳定性。作为第一步,使用250个选定的描述符为所有911种化学品开发了一个单一的SVM模型,实现了约70%的总体准确度,即正负正确估计。为了提高最终模型的准确性,根据所包含的子结构将911化学药品分为20个相互重叠的子组,针对每个子组优化了特定的SVM模型,并通过以下方法确定了911化学药品的预测致癌性:相应SVM模型的输出。基于将化学品分组为20个子结构而开发的模型可预测多种化学品的致癌性,总体准确度约为80%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号