...
首页> 外文期刊>Journal of chemical information and modeling >Development and Application of a Data-Driven Reaction Classification Model: Comparison of an Electronic Lab Notebook and Medicinal Chemistry Literature
【24h】

Development and Application of a Data-Driven Reaction Classification Model: Comparison of an Electronic Lab Notebook and Medicinal Chemistry Literature

机译:数据驱动反应分类模型的开发与应用:电子实验室笔记本和药物化学文学的比较

获取原文
获取原文并翻译 | 示例

摘要

Reaction classification has often been considered an important task for many different applications, and has traditionally been accomplished using hand-coded rule-based approaches. However, the availability of large collections of reactions enables data-driven approaches to be developed. We present the development and validation of a 336-class machine learning-based classification model integrated within a Conformal Prediction (CP) framework to associate reaction class predictions with confidence estimations. We also propose a data-driven approach for "dynamic" reaction fingerprinting to maximize the effectiveness of reaction encoding, as well as developing a novel reaction classification system that organizes labels into four hierarchical levels (SHREC: Sheffield Hierarchical REaction Classification). We show that the performance of the CP augmented model can be improved by defining confidence thresholds to detect predictions that are less likely to be false. For example, the external validation of the model reports 95% of predictions as correct by filtering out less than 15% of the uncertain classifications. The application of the model is demonstrated by classifying two reaction data sets: one extracted from an industrial ELN and the other from the medicinal chemistry literature. We show how confidence estimations and class compositions across different levels of information can be used to gain immediate insights on the nature of reaction collections and hidden relationships between reaction classes.
机译:反应分类通常被认为是许多不同应用程序的重要任务,并且传统上使用了基于手工编码的规则的方法来实现。但是,大量反应的可用性使得能够开发数据驱动的方法。我们介绍了集成在共形预测(CP)框架内的336级机器学习的分类模型的开发和验证,以将反应类预测与置信度估计联系起来。我们还提出了一种数据驱动方法,用于“动态”反应指纹识别,以最大限度地提高反应编码的有效性,以及开发一种组织标签的新型反应分类系统,进入四个层级(SHREC:Shrec:Sheffield等级反应分类)。我们表明,可以通过定义置信阈值来改善CP增强模型的性能来检测不太可能是假的预测。例如,模型的外部验证报告了95%的预测,通过滤除不到15%的不确定分类。通过对两组反应数据集进行分类:从医药化学文献中提取的两个反应数据集来证明模型的应用。我们展示了不同级别信息中的置信度估计和阶级组成如何用于立即见解反应收集和反应类之间的隐藏关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号