...
首页> 外文期刊>Knowledge and information systems >Comparison of descriptor spaces for chemical compound retrieval and classification
【24h】

Comparison of descriptor spaces for chemical compound retrieval and classification

机译:用于化合物检索和分类的描述符空间比较

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In recent years the development of computational techniques that build models to correctly assign chemical compounds to various classes or to retrieve potential drug-like compounds has been an active area of research. Many of the best-performing techniques for these tasks utilize a descriptor-based representation of the compound that captures various aspects of the underlying molecular graph's topology. In this paper we compare five different set of descriptors that are currently used for chemical compound classification. We also introduce four different descriptors derived from all connected fragments present in the molecular graphs primarily for the purpose of comparing them to the currently used descriptor spaces and analyzing what properties of descriptor spaces are helpful in providing effective representation for molecular graphs. In addition, we introduce an extension to existing vector-based kernel functions to take into account the length of the fragments present in the descriptors. We experimentally evaluate the performance of the previously introduced and the new descriptors in the context of SVM-based classification and ranked-retrieval on 28 classification and retrieval problems derived from 18 datasets. Our experiments show that for both of these tasks, two of the four descriptors introduced in this paper along with the extended connectivity fingerprint based descriptors consistently and statistically outperform previously developed schemes based on the widely used fingerprint- and Maces keys-based descriptors, as well as recently introduced descriptors obtained by mining and analyzing the structure of the molecular graphs.
机译:近年来,建立模型以正确地将化学化合物分配给各种类别或检索潜在的类药物化合物的计算技术的发展一直是研究的活跃领域。用于这些任务的许多性能最佳的技术都利用化合物的基于描述符的表示形式来捕获基本分子图拓扑的各个方面。在本文中,我们比较了目前用于化合物分类的五组不同的描述符。我们还介绍了从分子图中存在的所有连接片段中衍生出的四个不同的描述符,主要目的是将它们与当前使用的描述符空间进行比较,并分析描述符空间的哪些属性有助于为分子图提供有效的表示。另外,我们引入了对现有基于矢量的内核函数的扩展,以考虑到描述符中存在的片段的长度。我们在基于SVM的分类和对18个数据集衍生的28个分类和检索问题进行排序检索的情况下,通过实验评估了先前介绍的描述符和新描述符的性能。我们的实验表明,对于这两项任务,本文介绍的四个描述符中的两个以及基于扩展连接指纹的描述符在统计上均一贯地胜过先前基于广泛使用的基于指纹和Maces密钥的描述符而开发的方案。如最近介绍的描述符,它是通过挖掘和分析分子图的结构而获得的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号