...
首页> 外文期刊>AIChE Journal >Enhancing Molecular Discovery Using Descriptor-Free Rearrangement Clustering Techniques for Sparse Data Sets
【24h】

Enhancing Molecular Discovery Using Descriptor-Free Rearrangement Clustering Techniques for Sparse Data Sets

机译:使用无描述符重排聚类技术对稀疏数据集增强分子发现

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This article presents a descriptor-free method for estimating library compounds with desired properties from synthesizing and assaying minimal library space. The method works by identifying the optimal substituent ordering (i.e., the optimal encoding integer assignment to each functional group on every substituent site of molecular scaffold) based on a global pairwise difference metric intended to capture smoothness of the compound library. The reordering can be accomplished via a (i) mixed-integer linear programming (MILP) model, (ii) genetic algorithm based approach, or (iii) heuristic approach. We present performance comparisons between these techniques as well as an independent analysis of characteristics of the MILP model. Two sparsely sampled data matrices provided by Pfizer are analyzed to validate the proposed approach and we show that the rearrangement of these matrices leads to regular property landscapes which enable reliable property estimation/interpolation over the full library space. An iterative strategy for compound synthesis is also introduced that utilizes the results of the reordered data to direct the synthesis toward desirable compounds. We demonstrate in a simulated experiment using held out subsets of the data that the proposed iterative technique is effective in identifying compounds with desired physical properties.
机译:本文提供了一种无描述符的方法,该方法可通过合成和分析最小的库空间来估计具有所需特性的库化合物。该方法通过基于旨在捕获化合物库的光滑度的整体成对差异度量来确定最佳取代基顺序(即,对分子支架的每个取代位上的每个官能团的最佳编码整数分配)来工作。可以通过(i)混合整数线性规划(MILP)模型,(ii)基于遗传算法的方法或(iii)启发式方法来完成重新排序。我们介绍了这些技术之间的性能比较,以及对MILP模型特征的独立分析。对Pfizer提供的两个稀疏采样数据矩阵进行了分析,以验证所提出的方法,并且我们表明,这些矩阵的重新排列会导致规则的属性格局,从而实现对整个库空间进行可靠的属性估计/插值。还介绍了一种化合物合成的迭代策略,该策略利用重新排序的数据结果将合成导向所需的化合物。我们在使用保留的数据子集的模拟实验中证明,所提出的迭代技术可有效地识别具有所需物理性质的化合物。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号