首页> 外文期刊>Molecular informatics >SketchSort: Fast All Pairs Similarity Search for Large Databases of Molecular Fingerprints
【24h】

SketchSort: Fast All Pairs Similarity Search for Large Databases of Molecular Fingerprints

机译:SketchSort:快速搜索大型分子指纹数据库的所有对相似性

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Similarity networks of ligands are often reported useful in predicting chemical activities and target proteins. However, the naive method of computing all pairwise similarities of chemical fingerprints takes quadratic time, which is prohibitive for large scale databases with millions of ligands. We propose a fast all pairs similarity search method, called SketchSort, that maps chemical fingerprints to symbol strings with random projections, and finds similar strings by multiple masked sorting. Due to random projection, SketchSort misses a certain fraction of neighbors (i.e., false negatives). Nevertheless, the expected fraction of false negatives is theoretically derived and can be kept under a very small value. Experiments show that SketchSort is much faster than other similarity search methods and enables us to obtain a PubChem-scale similarity network quickly.
机译:经常报道配体的相似性网络可用于预测化学活性和靶蛋白。然而,计算化学指纹的所有成对相似性的幼稚方法要花费二次时间,这对于具有数百万个配体的大规模数据库是不可行的。我们提出了一种快速的全对相似性搜索方法,称为SketchSort,该方法将化学指纹映射到具有随机投影的符号字符串,并通过多次蒙版排序找到相似的字符串。由于随机投影,SketchSort会错过一定比例的邻居(即假阴性)。然而,理论上会得出预期的假阴性分数,并且可以将其保持在非常小的值下。实验表明,SketchSort比其他相似性搜索方法快得多,并使我们能够快速获得PubChem规模的相似性网络。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号