首页> 外文期刊>Journal of chemical information and modeling >Data-Driven Derivation of an 'Informer Compound Set' for Improved Selection of Active Compounds in High-Throughput Screening
【24h】

Data-Driven Derivation of an 'Informer Compound Set' for Improved Selection of Active Compounds in High-Throughput Screening

机译:数据驱动的“信息化合物集”的派生,用于高通量筛选中活性化合物的改进选择

获取原文
获取原文并翻译 | 示例
           

摘要

Despite the usefulness of high-throughput screening (HTS) in drug discovery, for some systems, low assay throughput or high screening cost can prohibit the screening of large numbers of compounds. In such cases, iterative cycles of screening involving active learning (AL) are employed, creating the need for smaller "informer sets" that can be routinely screened to build predictive models for selecting compounds from the-screening collection for follow-up screens. Here, we present a data-driven derivation of an informer compound set with improved predictivity of active compounds in HTS, and we validate its benefit over randomly selected training sets on 46 PubChem assays comprising at least 300,000 compounds and covering a wide range of assay biology. The informer compound set showed improvement in BEDROC(alpha = 100), PRAUC, and ROCAUC values averaged over all assays of 0.024, 0.014, and 0.016, respectively, compared to randomly selected training sets, all with paired t-test p-values <10(-15). A per-assay assessment showed that the BEDROC(alpha = 100), which is of particular relevance for early retrieval of actives, improved for 38 out of 46 assays, increasing the success rate of smaller follow-up screens. Overall, we showed that an informer set derived from historical HTS activity data can be employed for routine small-scale exploratory screening in an assay-agnostic fashion. This approach led to a consistent improvement in hit rates in follow-up screens without Compromising scaffold retrieval. The informer set is adjustable in size depending on the number of compounds, one intends to screen, as performance gains are realized for sets with more than 3,000 compounds, and this set is therefore applicable to a variety of situations. Finally, our results indicate that random sampling may not adequately cover descriptor space, drawing attention to the importance of the composition of the training set for predicting actives.
机译:尽管高通量筛选(HTS)在药物发现中很有用,但对于某些系统而言,低检测通量或高筛选成本可能会阻止筛选大量化合物。在这种情况下,采用了涉及主动学习(AL)的迭代筛选循环,因此需要较小的“信息提供者”,可以常规地筛选这些信息提供者以建立预测模型,以从后续筛选的筛选集合中选择化合物。在这里,我们介绍了一种信息驱动的化合物集的数据驱动派生形式,该化合物具有较高的HTS活性化合物的可预测性,并且我们在46种PubChem分析(至少包含300,000种化合物,涵盖了广泛的分析生物学)上验证了其比随机选择的训练集的优势。与随机选择的训练组相比,告知者化合物组在所有测定中的平均水平分别为0.024、0.014和0.016,表明BEDROC(alpha = 100),PRAUC和ROCAUC值均有改善,所有训练组均具有配对的t检验p值< 10(-15)。每次分析评估表明,与早期活性物质回收特别相关的BEDROC(alpha = 100)在46种分析中有38种得到了改善,从而提高了较小随访筛查的成功率。总体而言,我们表明,可以将源自历史HTS活动数据的信息提供者集以与实验无关的方式用于常规的小规模探索性筛选。这种方法可以在不损害支架检索的情况下,在后续筛查中提高命中率。信息提供者集合的大小可根据化合物的数量进行调整,因此打算筛选,因为对于包含3,000多种化合物的集合而言,实现了性能提升,因此该集合适用于多种情况。最后,我们的结果表明,随机采样可能无法充分覆盖描述符空间,从而引起人们注意训练集的组成对预测活动的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号