首页> 外文期刊>Analytical chemistry >Deep Learning to Generate in Silico Chemical Property Libraries and Candidate Molecules for Small Molecule Identification in Complex Samples
【24h】

Deep Learning to Generate in Silico Chemical Property Libraries and Candidate Molecules for Small Molecule Identification in Complex Samples

机译:深度学习在硅化学性质文库中产生,候选分子用于复杂样品中的小分子鉴定

获取原文
获取原文并翻译 | 示例
       

摘要

Comprehensive and unambiguous identification of small molecules in complex samples will revolutionize our understanding of the role of metabolites in biological systems. Existing and emerging technologies have enabled measurement of chemical properties of molecules in complex mixtures and, in concert, are sensitive enough to resolve even stereoisomers. Despite these experimental advances, small molecule identification is inhibited by (i) chemical reference libraries (e.g., mass spectra, collision cross section, and other measurable property libraries) representing <1% of known molecules, limiting the number of possible identifications, and (ii) the lack of a method to generate candidate matches directly from experimental features (i.e., without a library). To this end, we developed a variational autoencoder (VAE) to learn a continuous numerical, or latent, representation of molecular structure to expand reference libraries for small molecule identification. We extended the VAE to include a chemical property decoder, trained as a multitask network, in order to shape the latent representation such that it assembles according to desired chemical properties. The approach is unique in its application to metabolomics and small molecule identification, with its focus on properties that can be obtained from experimental measurements (m/z, CCS) paired with its training paradigm, which involved a cascade of transfer learning iterations. First, molecular representation is learned from a large data set of structures with m/z labels. Next, in silico property values are used to continue training, as experimental property data is limited. Finally, the network is further refined by being trained with the experimental data. This allows the network to learn as much as possible at each stage, enabling success with progressively smaller data sets without overfitting. Once trained, the network can be used to predict chemical properties directly from structure, as well as generate candidate structures with desired chemical properties. Our approach is orders of magnitude faster than first-principles simulation for CCS property prediction. Additionally, the ability to generate novel molecules along manifolds, defined by chemical property analogues, positions DarkChem as highly useful in a number of application areas, including metabolomics and small molecule identification, drug discovery and design, chemical forensics, and beyond.
机译:复杂样品中小分子的综合和明确鉴定将彻底改变我们对代谢物在生物系统中的作用的理解。现有和新兴技术能够使复合混合物中分子的化学性质的测量,并且在音乐会上敏感,足以解决甚至立体异构体。尽管存在这些实验进展,但是(i)化学参考文库(例如,质谱,碰撞横截面和其他可测量的属性文库)抑制了小分子鉴定,其代表了<1%的已知分子,限制了可能的识别的数量和( ii)直接从实验特征(即,没有库)直接生成候选匹配的方法。为此,我们开发了一个变形的自身形式(VAE),以学习分子结构的连续数值或潜在表示,以扩展用于小分子鉴定的参考文库。我们将VAE延伸以包括作为多任务网络训练的化学性质解码器,以塑造潜在的表示,使得其根据所需的化学性质组装。该方法在其对代谢组科和小分子识别的应用中是独一无二的,其专注于可以从与其训练范例配对的实验测量(M / Z,CC)获得的性质,这涉及转移学习迭代的级联。首先,从具有M / Z标签的大型结构的结构中学习分子表示。接下来,在Silico属性值中用于继续培训,因为实验性数据有限。最后,通过用实验数据训练,网络进一步改进。这允许网络在每个阶段尽可能多地学习,使得能够在不过度拟合的情况下使用逐渐较小的数据集成功。一旦训练,网络可用于预测直接从结构的化学性质,以及产生具有所需化学性质的候选结构。我们的方法是比CCS性能预测的第一原理模拟更快的数量序列。另外,沿着化学性质类似物定义的歧管的新型分子的能力,将Darkchem定位在许多应用区域中,包括代谢组和小分子鉴定,药物发现和设计,化学质量,以及超越。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号