Development and Application of a Data-Driven Reaction Classification Model: Comparison of an Electronic Lab Notebook and Medicinal Chemistry Literature

Ghiandoni Gian Marco; Bodkin Michael J.; Chen Beining; Hristozov Dimitar; Wallace James E. A.; Webster James; Gillet Valerie J.

首页> 外文期刊>Journal of chemical information and modeling >Development and Application of a Data-Driven Reaction Classification Model: Comparison of an Electronic Lab Notebook and Medicinal Chemistry Literature

【24h】

Development and Application of a Data-Driven Reaction Classification Model: Comparison of an Electronic Lab Notebook and Medicinal Chemistry Literature

机译：数据驱动反应分类模型的开发与应用：电子实验室笔记本和药物化学文学的比较

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Reaction classification has often been considered an important task for many different applications, and has traditionally been accomplished using hand-coded rule-based approaches. However, the availability of large collections of reactions enables data-driven approaches to be developed. We present the development and validation of a 336-class machine learning-based classification model integrated within a Conformal Prediction (CP) framework to associate reaction class predictions with confidence estimations. We also propose a data-driven approach for "dynamic" reaction fingerprinting to maximize the effectiveness of reaction encoding, as well as developing a novel reaction classification system that organizes labels into four hierarchical levels (SHREC: Sheffield Hierarchical REaction Classification). We show that the performance of the CP augmented model can be improved by defining confidence thresholds to detect predictions that are less likely to be false. For example, the external validation of the model reports 95% of predictions as correct by filtering out less than 15% of the uncertain classifications. The application of the model is demonstrated by classifying two reaction data sets: one extracted from an industrial ELN and the other from the medicinal chemistry literature. We show how confidence estimations and class compositions across different levels of information can be used to gain immediate insights on the nature of reaction collections and hidden relationships between reaction classes.

机译：反应分类通常被认为是许多不同应用程序的重要任务，并且传统上使用了基于手工编码的规则的方法来实现。但是，大量反应的可用性使得能够开发数据驱动的方法。我们介绍了集成在共形预测（CP）框架内的336级机器学习的分类模型的开发和验证，以将反应类预测与置信度估计联系起来。我们还提出了一种数据驱动方法，用于“动态”反应指纹识别，以最大限度地提高反应编码的有效性，以及开发一种组织标签的新型反应分类系统，进入四个层级（SHREC：Shrec：Sheffield等级反应分类）。我们表明，可以通过定义置信阈值来改善CP增强模型的性能来检测不太可能是假的预测。例如，模型的外部验证报告了95％的预测，通过滤除不到15％的不确定分类。通过对两组反应数据集进行分类：从医药化学文献中提取的两个反应数据集来证明模型的应用。我们展示了不同级别信息中的置信度估计和阶级组成如何用于立即见解反应收集和反应类之间的隐藏关系。

著录项

来源
《Journal of chemical information and modeling》 |2019年第10期|共21页
作者
Ghiandoni Gian Marco; Bodkin Michael J.; Chen Beining; Hristozov Dimitar; Wallace James E. A.; Webster James; Gillet Valerie J.;
展开▼
作者单位

Univ Sheffield Informat Sch 211 Portobello Sheffield S1 4DP S Yorkshire England;

Evotec UK Ltd 114 Innovat Dr Milton Pk Abingdon OX14 4RZ Oxon England;

Univ Sheffield Chem Dept Dainton Bldg Brook Hill Sheffield S3 7HF S Yorkshire England;

Evotec UK Ltd 114 Innovat Dr Milton Pk Abingdon OX14 4RZ Oxon England;

Evotec UK Ltd 114 Innovat Dr Milton Pk Abingdon OX14 4RZ Oxon England;

Univ Sheffield Informat Sch 211 Portobello Sheffield S1 4DP S Yorkshire England;

Univ Sheffield Informat Sch 211 Portobello Sheffield S1 4DP S Yorkshire England;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类化学;化学工业;
关键词

相似文献

外文文献
中文文献
专利

1. Development and Application of a Data-Driven Reaction Classification Model: Comparison of an Electronic Lab Notebook and Medicinal Chemistry Literature [J] . Ghiandoni Gian Marco, Bodkin Michael J., Chen Beining, Journal of chemical information and modeling . 2019,第10期

机译：数据驱动反应分类模型的开发与应用：电子实验室笔记本和药物化学文学的比较
2. Electronic Laboratory Notebooks Allow for Modifications in a General, Organic, and Biochemistry Chemistry Laboratory To Increase Authenticity of the Student Experience [J] . Dood Amber J., Johnson Lisa M., Shorb Justin M. Journal of Chemical Education . 2018,第11期

机译：电子实验室笔记本允许在一般，有机和生物化学化学实验室进行修改，以提高学生体验的真实性
3. Electronic Laboratory Notebooks Allow for Modifications in a General, Organic, and Biochemistry Chemistry Laboratory To Increase Authenticity of the Student Experience [J] . Dood Amber J., Johnson Lisa M., Shorb Justin M. Journal of Chemical Education . 2018,第11期

机译：电子实验室笔记本允许在一般，有机和生物化学化学实验室进行修改，以提高学生体验的真实性
4. A Study: From Electronic Laboratory Notebooks to Generated Queries for Literature Recommendation [C] . Oldooz Dianat, Cecile Paris, Stephen Wan Australasian Language Technology Association workshop . 2013

机译：研究：从电子实验室笔记本到生成的文献推荐查询
5. Development of the KMLYP density functional theory method, and application of quantum chemistry in modeling surface chemical reactions. [D] . Kang, Jeung Ku. 2002

机译：KMLYP密度泛函理论方法的发展以及量子化学在表面化学反应建模中的应用。
6. Applications of organocatalysed visible-light photoredox reactions for medicinal chemistry [O] . Michael K Bogdos, Emmanuel Pinard, John A Murphy 2018

机译：有机催化可见光氧化还原反应在药物化学中的应用
7. Development and Application of a Data-Driven Reaction Classification Model: Comparison of an Electronic Lab Notebook and Medicinal Chemistry Literature [O] . -1

机译：数据驱动反应分类模型的开发与应用：电子实验室笔记本和药用化学文学的比较
8. Collaborative Electronic Notebooks as Electronic Records: Design Issues for the Secure Electronic Laboratory Notebook (ELN) [R] . Myers, J. D. 2003

机译：作为电子记录的协作电子笔记本：安全电子实验室笔记本（ELN）的设计问题

Development and Application of a Data-Driven Reaction Classification Model: Comparison of an Electronic Lab Notebook and Medicinal Chemistry Literature

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅