首页> 外文期刊>Molecular informatics >The Development of Novel Chemical Fragment- Based Descriptors Using Frequent Common Subgraph Mining Approach and Their Application in QSAR Modeling
【24h】

The Development of Novel Chemical Fragment- Based Descriptors Using Frequent Common Subgraph Mining Approach and Their Application in QSAR Modeling

机译:基于常见子图挖掘方法的新型基于化学片段的描述子的开发及其在QSAR建模中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

We present a novel approach to generating fragment-based molecular descriptors. The molecules are represented by labeled undirected chemical graph. Fast Frequent Subgraph Mining (FFSM) is used to find chemical-fragments (subgraphs) that occur in at least a subset of all molecules in a dataset. The collection of frequent subgraphs (FSG) forms a dataset-specific descriptors whose values for each molecule are defined by the number of times each frequent fragment occurs in this molecule. We have employed the FSG descriptors to develop variable selection k Nearest Neighbor (kNN) QSAR models of several datasets with binary target property including Maximum Recommended Therapeutic Dose (MRTD), Salmonella Mutagenicity (Ames Genotoxicity), and P-Glycoprotein (PGP) data. Each dataset was divided into training, test, and validation sets to establish the statistical figures of merit reflecting the model validated predictive power. The classification accuracies of models for both training and test sets for all datasets exceeded 75%, and the accuracy for the external validation sets exceeded 72%. The model accuracies were comparable or better than those reported earlier in the literature for the same datasets. Furthermore, the use of fragment-based descriptors affords mechanistic interpretation of validated QSAR models in terms of essential chemical fragments responsible for the compounds' target property.
机译:我们提出了一种新颖的方法来生成基于片段的分子描述符。分子由标记的无向化学图表示。快速频繁子图挖掘(FFSM)用于查找在数据集中所有分子的至少一个子集中出现的化学片段(子图)。频繁子图(FSG)的集合形成特定于数据集的描述符,其每个分子的值由该分子中每个频繁片段出现的次数定义。我们已经使用FSG描述符开发了具有二进制目标属性的几个数据集的变量选择k最近邻(kNN)QSAR模型,包括最大推荐治疗剂量(MRTD),沙门氏菌致突变性(Ames基因毒性)和P-糖蛋白(PGP)数据。每个数据集都分为训练集,测试集和验证集,以建立反映模型验证的预测能力的统计指标。所有数据集的训练集和测试集的模型分类精度均超过75%,外部验证集的准确性超过72%。对于相同的数据集,模型的准确性与文献中报道的相当或更高。此外,基于片段的描述符的使用可根据负责化合物目标特性的基本化学片段,提供对经验证的QSAR模型的机械解释。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号