首页> 美国卫生研究院文献>ACS Omega >Experimental Errors in QSAR Modeling Sets: What WeCan Do and What We Cannot Do
【2h】

Experimental Errors in QSAR Modeling Sets: What WeCan Do and What We Cannot Do

机译:QSAR建模集中的实验错误:我们要做什么能做什么和我们不能做什么

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Numerous chemical data sets have become available for quantitative structure–activity relationship (QSAR) modeling studies. However, the quality of different data sources may be different based on the nature of experimental protocols. Therefore, potential experimental errors in the modeling sets may lead to the development of poor QSAR models and further affect the predictions of new compounds. In this study, we explored the relationship between the ratio of questionable data in the modeling sets, which was obtained by simulating experimental errors, and the QSAR modeling performance. To this end, we used eight data sets (four continuous endpoints and four categorical endpoints) that have been extensively curated both in-house and by our collaborators to create over 1800 various QSAR models. Each data set was duplicated to create several new modeling sets with different ratios of simulated experimental errors (i.e., randomizing the activities of part of the compounds) in the modeling process. A fivefold cross-validation process was used to evaluate the modeling performance, which deteriorateswhen the ratio of experimental errors increases. All of the resultingmodels were also used to predict external sets of new compounds, whichwere excluded at the beginning of the modeling process. The modelingresults showed that the compounds with relatively large predictionerrors in cross-validation processes are likely to be those with simulatedexperimental errors. However, after removing a certain number of compoundswith large prediction errors in the cross-validation process, theexternal predictions of new compounds did not show improvement. Ourconclusion is that the QSAR predictions, especially consensus predictions,can identify compounds with potential experimental errors. But removingthose compounds by the cross-validation procedure is not a reasonablemeans to improve model predictivity due to overfitting.
机译:大量的化学数据集可用于定量结构-活性关系(QSAR)建模研究。但是,根据实验协议的性质,不同数据源的质量可能会有所不同。因此,建模集中潜在的实验错误可能导致开发较差的QSAR模型并进一步影响新化合物的预测。在这项研究中,我们探索了通过模拟实验误差获得的建模集中可疑数据的比率与QSAR建模性能之间的关系。为此,我们使用了八个数据集(四个连续终结点和四个分类终结点),这些数据集在内部和合作者中得到广泛策划,以​​创​​建1800多种QSAR模型。复制每个数据集以创建多个新的建模集,这些新的建模集在建模过程中具有不同比例的模拟实验误差(即,使部分化合物的活性随机化)。使用五重交叉验证过程来评估建模性能,这会恶化当实验错误率增加时。所有的结果模型还用于预测新化合物的外部集合,在建模过程的开始被排除在外。造型结果表明,该化合物具有相对较大的预测交叉验证过程中的错误很可能是模拟错误实验错误。但是,去除一定数量的化合物后在交叉验证过程中存在较大的预测错误,新化合物的外部预测并未显示出改善。我们的结论是QSAR预测,尤其是共识预测,可以识别具有潜在实验错误的化合物。但是删除这些化合物通过交叉验证程序是不合理的表示由于过度拟合而提高模型的可预测性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号