...
首页> 外文期刊>Journal of chemical information and modeling >Structural Similarity Based Kriging for Quantitative Structure Activity and Property Relationship Modeling
【24h】

Structural Similarity Based Kriging for Quantitative Structure Activity and Property Relationship Modeling

机译:基于结构相似性的克里格,用于定量结构活动和财产关系建模

获取原文
获取原文并翻译 | 示例
           

摘要

Structurally similar molecules tend to have similar properties, i.e. closer molecules in the molecular space are more likely to yield similar property values while distant molecules are more likely to yield different values. Based on this principle, we propose the use of a new method that takes into account the high dimensionality of the molecular space, predicting chemical, physical, or biological properties based on the most similar compounds with measured properties. This methodology uses ordinary kriging coupled with three different molecular similarity approaches (based on molecular descriptors, fingerprints, and atom matching) which creates an interpolation map over the molecular space that is capable of predicting properties/activities for diverse chemical data sets. The proposed method was tested in two data sets of diverse chemical compounds collected from the literature and preprocessed. One of the data sets contained dihydrofolate reductase inhibition activity data, and the second molecules for which aqueous solubility was known. The overall predictive results using kriging for both data sets comply with the results obtained in the literature using typical QSPR/QSAR approaches. However, the procedure did not involve any type of descriptor selection or even minimal information about each problem, suggesting that this approach is directly applicable to a large spectrum of problems in QSAR/QSPR. Furthermore, the predictive results improve significantly with the similarity threshold between the training and testing compounds, allowing the definition of a confidence threshold of similarity and error estimation for each case inferred. The use of kriging for interpolation over the molecular metric space is independent of the training data set size, and no reparametrizations are necessary when more compounds are added or removed from the set, and increasing the size of the database will consequentially improve the quality of the estimations. Finally it is shown that this model can be used for checking the consistency of measured data and for guiding an extension of the training set by determining the regions of the molecular space for which new experimental measurements could be used to maximize the model's predictive performance.
机译:结构类似的分子倾向于具有类似的性质,即分子空间中的较近分子更可能产生类似的性质值,而远处分子更可能产生不同的值。基于该原理,我们提出了使用一种新方法,其考虑了分子空间的高维度,基于具有测量性质的最相似的化合物预测化学品,物理或生物学性质。该方法使用普通的克里格与三种不同的分子相似性接近(基于分子描述符,指纹和原子匹配)耦合,其在能够预测不同化学数据集的性质/活动的分子空间上产生插值图。在从文献和预处理的两种不同化学化合物中测试了所提出的方法。其中一个数据集包含了二氢醇还原酶抑制活性数据,以及已知含水溶解度的第二分子。使用克里格的整体预测结果对于两个数据集符合使用典型的QSPR / QSAR方法在文献中获得的结果。但是,该过程不涉及任何类型的描述符选择甚至有关每个问题的最小信息,这表明这种方法直接适用于QSAR / QSPR中的大量问题。此外,预测结果随着训练和测试化合物之间的相似性阈值而显着改善,允许对每个病例​​的相似性和误差估计的定义来改进。在分子度量空间上使用Kriging进行插值与训练数据集大小无关,并且当从该组中添加或移除更多化合物时,没有必要的重新处理,并且增加数据库的大小将改善尺寸估计。最后,示出该模型可用于检查测量数据的一致性,并通过确定可以使用新实验测量的分子空间的区域来引导训练集的扩展来最大化模型的预测性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号