
Mining the Critical Conditions for New Hypotheses of Materials from Historical Reaction Data




The new findings in material science often require a high research cost for the following two aspects. First is that the chemical reaction craft needs continuous optimization and may consume lots of valuable reactants and apparatus during daily experiments. Second, the success of a designed experiment relies heavily on researchers' experience. With the starting of the Materials Genome Initiative (MGI) project, researchers are beginning to record historical reaction data, and seek new solutions via computer techniques, such as data mining and machine learning. In this paper, we study the reaction data of inorganic-organic hybrid materials from the Dark Reaction Project from Haverford College with simple machine learning algorithms (i.e., Bayes Net, SVM and C4.5), ensemble learning models (i.e., Random Forest, Stacking, Gradient Boosting Decision Tree (GBDT) and XGBoost), and deep neural network models. Besides accuracy of the prediction models, we also analyze the reaction conditions that have important reflecting in chemistry with different ranking algorithms. With a series of evaluation, we find that the welldesigned stacking-based ensemble learning model can reach the highest prediction accuracy of 61% (8% higher than GBDT and 5% higher than XGBoost) on the top50 subsets based on 'symmetrical uncertainty ranking' on the standalone data set which was not used in the Dark Reaction Project before.
机译:材料科学的新发现通常需要在以下两个方面进行大量研究。首先是化学反应工艺需要不断优化,在日常实验中可能会消耗大量有价值的反应物和设备。其次,设计实验的成功很大程度上取决于研究人员的经验。随着材料基因组计划(MGI)项目的启动,研究人员开始记录历史反应数据,并通过数据挖掘和机器学习等计算机技术寻求新的解决方案。在本文中,我们使用简单的机器学习算法(即Bayes Net,SVM和C4.5),集成学习模型(即Random Forest,堆叠,梯度提升决策树(GBDT)和XGBoost)以及深度神经网络模型。除了预测模型的准确性外,我们还使用不同的排名算法分析了在化学反应中具有重要意义的反应条件。通过一系列评估,我们发现,基于“对称不确定性排名”,精心设计的基于堆栈的集成学习模型可以对前50个子集实现最高61%的预测准确度(比GBDT高8%,比XGBoost高5%)。在以前的“黑暗反应计划”中未使用过的独立数据集上。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号