首页> 外文期刊>Journal of Computer-Aided Molecular Design >Utilizing high throughput screening data for predictive toxicology models: protocols and application to MLSCN assays
【24h】

Utilizing high throughput screening data for predictive toxicology models: protocols and application to MLSCN assays

机译:利用高通量筛选数据建立预测毒理学模型:方案和在MLSCN分析中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

Computational toxicology is emerging as an encouraging alternative to experimental testing. The Molecular Libraries Screening Center Network (MLSCN) as part of the NIH Molecular Libraries Roadmap has recently started generating large and diverse screening datasets, which are publicly available in PubChem. In this report, we investigate various aspects of developing computational models to predict cell toxicity based on cell proliferation screening data generated in the MLSCN. By capturing feature-based information in those datasets, such predictive models would be useful in evaluating cell-based screening results in general (for example from reporter assays) and could be used as an aid to identify and eliminate potentially undesired compounds. Specifically we present the results of random forest ensemble models developed using different cell proliferation datasets and highlight protocols to take into account their extremely imbalanced nature. Depending on the nature of the datasets and the descriptors employed we were able to achieve percentage correct classification rates between 70% and 85% on the prediction set, though the accuracy rate dropped significantly when the models were applied to in vivo data. In this context we also compare the MLSCN cell proliferation results with animal acute toxicity data to investigate to what extent animal toxicity can be correlated and potentially predicted by proliferation results. Finally, we present a visualization technique that allows one to compare a new dataset to the training set of the models to decide whether the new dataset may be reliably predicted.
机译:计算毒理学正在成为实验测试的一种令人鼓舞的替代方法。作为NIH分子图书馆路线图的一部分,分子图书馆筛选中心网络(MLSCN)最近开始生成庞大且多样化的筛选数据集,这些数据集可在PubChem中公开获得。在本报告中,我们调查了开发计算模型的各个方面,这些模型基于MLSCN中生成的细胞增殖筛选数据来预测细胞毒性。通过在那些数据集中捕获基于特征的信息,这样的预测模型通常可用于评估基于细胞的筛查结果(例如,通过报告基因分析得出的结果),并且可以用作识别和消除潜在有害化合物的辅助工具。具体来说,我们介绍了使用不同细胞增殖数据集开发的随机森林集成模型的结果,并重点介绍了协议,以考虑到其极其不平衡的性质。根据数据集的性质和所使用的描述符,我们能够在预测集上达到70%到85%之间的正确分类率,尽管将模型应用于体内数据时准确率明显下降。在这种情况下,我们还将MLSCN细胞的增殖结果与动物急性毒性数据进行比较,以研究在何种程度上可以通过增殖结果关联和潜在地预测动物毒性。最后,我们提出了一种可视化技术,该技术允许将新数据集与模型的训练集进行比较,以确定是否可以可靠地预测新数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号