首页> 外文会议>International conference on artificial neural networks >Improving Deep Generative Models with Randomized SMILES
【24h】

Improving Deep Generative Models with Randomized SMILES

机译:使用随机SMILES改进深度生成模型

获取原文

摘要

A Recurrent Neural Network (RNN) trained with a set of molecules represented as SMILES strings can generate millions of different valid and meaningful chemical structures. In most of the reported architectures the models have been trained using a canonical (unique for each molecule) representation of SMILES. Instead, this research shows that when using randomized SMILES as a data amplification technique, a model can generate more molecules and those are going to accurately represent the training set properties. To show that, an extensive benchmark study has been conducted using research from a recently published article which shows that models trained with molecules from the GDB-13 database (975 million molecules) achieve better overall chemical space coverage when the posterior probability distribution is as uniform as possible. Specifically, we created models that generate nearly all the GDB-13 chemical space using only 1 million molecules as training set. Lastly, models were also trained with smaller training set sizes and show substantial improvement when using randomized SMILES compared to canonical.
机译:用一组表示为SMILES字符串的分子训练的递归神经网络(RNN)可以生成数百万个不同的有效和有意义的化学结构。在大多数已报告的体系结构中,已经使用SMILES的规范表示(每个分子唯一)来训练模型。相反,这项研究表明,将随机SMILES用作数据放大技术时,模型可以生成更多的分子,并且这些分子将准确地表示训练集的属性。为了表明这一点,已使用最近发表的文章进行的研究进行了广泛的基准研究,该研究表明,当后验概率分布均匀时,使用GDB-13数据库中的分子训练的模型(9.75亿个分子)可以获得更好的整体化学空间覆盖率尽可能。具体来说,我们创建了仅使用一百万个分子作为训练集即可生成几乎所有GDB-13化学空间的模型。最后,还使用较小的训练集大小对模型进行了训练,与使用规范的模型相比,使用随机的SMILES表现出了显着的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号