...
首页> 外文期刊>Molecular Systems Design & Engineering >Designing compact training sets for data-driven molecular property prediction through optimal exploitation and exploration
【24h】

Designing compact training sets for data-driven molecular property prediction through optimal exploitation and exploration

机译:为数据驱动设计紧凑的训练集通过优化分子性质的预测开发和探索

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we consider the problem of designing a compact training set comprising the most informa- tive molecules from a specified library to build data-driven molecular property models. Specifically, using (i) sparse generalized group additivity and (ii) kernel ridge regression as two representative classes of models, we propose a method combining rigorous model-based design of experiments and cheminformatics- based diversity-maximizing subset selection within the ε-greedy framework to systematically minimize the amount of data needed to train these models. We demonstrate the effectiveness of the algorithm on vari- ous databases, including QM7, NIST, and a dataset of surface intermediates for calculating thermodynamic properties (heat of atomization and enthalpy of formation). For sparse group additive models, a balance be- tween exploration (diversity-maximizing selection) and exploitation (D-optimality selection) leads to learn- ing with a fraction (sometimes as little as 15%) of the data to achieve similar accuracy to five-fold cross val- idation on the entire set. On the other hand, our results indicate that kernel methods prefer diversity- maximizing selection.
机译:在本文中,我们考虑的问题设计一个紧凑的训练集组成大多数informa——从指定有效的分子库来构建数据驱动的分子性质模型。广义可加性和(2)内核岭回归的两个代表类模型,我们提出一个方法结合严格的基于模型的实验和设计cheminformatics——基于diversity-maximizing子集选择在ε贪婪的框架系统地减少所需的数据量训练这些模型。算法的变化——我们的有效性数据库,包括QM7, NIST的数据集计算表面中间体热力学性质(雾化和热生成焓)。——渐变模型,平衡探索(diversity-maximizing选择)和剥削(D-optimality选择)导致学习——荷兰国际集团(ing)一小部分(有时仅为15%)数据达到相似的精度5倍交叉val - idation整个集合。另一方面,我们的研究结果表明,内核方法选择多样性,最大化的选择。

著录项

相似文献

  • 外文文献
  • 中文文献
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号