首页> 外文期刊>The Journal of Chemical Physics >Machine learning of free energies in chemical compound space using ensemble representations: Reaching experimental uncertainty for solvation
【24h】

Machine learning of free energies in chemical compound space using ensemble representations: Reaching experimental uncertainty for solvation

机译:使用集合表示的化合物空间中的自由能量的机器学习:达到求解的实验性不确定性

获取原文
获取原文并翻译 | 示例
           

摘要

Free energies govern the behavior of soft and liquid matter, and improving their predictions could have a large impact on the development of drugs, electrolytes, or homogeneous catalysts. Unfortunately, it is challenging to devise an accurate description of effects governing solvation such as hydrogen-bonding, van der Waals interactions, or conformational sampling. We present a Free energy Machine Learning (FML) model applicable throughout chemical compound space and based on a representation that employs Boltzmann averages to account for an approximated sampling of configurational space. Using the FreeSolv database, FML's out-of-sample prediction errors of experimental hydration free energies decay systematically with training set size, and experimental uncertainty (0.6 kcal/mol) is reached after training on 490 molecules (80% of FreeSolv). Corresponding FML model errors are on par with state-of-the art physics based approaches. To generate the input representation for a new query compound, FML requires approximate and short molecular dynamics runs. We showcase its usefulness through analysis of solvation free energies for 116k organic molecules (all force-field compatible molecules in the QM9 database), identifying the most and least solvated systems and rediscovering quasi-linear structure-property relationships in terms of simple descriptors such as hydrogen-bond donors, number of NH or OH groups, number of oxygen atoms in hydrocarbons, and number of heavy atoms. FML's accuracy is maximal when the temperature used for the molecular dynamics simulation to generate averaged input representation samples in training is the same as for the query compounds. The sampling time for the representation converges rapidly with respect to the prediction error.
机译:自由能控制着软物质和液体物质的行为,改进它们的预测可能对药物、电解质或均相催化剂的发展产生重大影响。不幸的是,要准确描述控制溶剂化的效应,例如氢键、范德华相互作用或构象取样,是一个挑战。我们提出了一个适用于整个化合物空间的自由能机器学习(FML)模型,该模型基于一种表示法,该表示法采用波尔兹曼平均值来解释构型空间的近似采样。利用FreeSolv数据库,FML对实验水合自由能的样本外预测误差随着训练集的大小而系统衰减,并且在490个分子(80%的FreeSolv)上训练后达到实验不确定度(0.6 kcal/mol)。相应的FML模型误差标准杆数与最先进的基于物理的方法。为了生成新查询化合物的输入表示,FML需要近似和短时间的分子动力学运行。我们通过分析116k有机分子(QM9数据库中的所有力场兼容分子)的溶剂化自由能,确定溶剂化程度最高和最低的体系,并重新发现简单描述符(如氢键供体、NH或OH基团数、碳氢化合物中氧原子数、,以及重原子的数量。当用于分子动力学模拟以生成训练中的平均输入表示样本的温度与用于查询化合物的温度相同时,FML的精度最高。表示的采样时间相对于预测误差迅速收敛。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号