首页> 外文会议>International conference on similarity search and applications >On the Analysis of Compressed Chemical Fingerprints
【24h】

On the Analysis of Compressed Chemical Fingerprints

机译:关于压缩化学指纹的分析

获取原文

摘要

Chemical fingerprints are binary strings used to represent the distinctive features of molecules in order to efficiently support similarity search of chemical data. In large repositories, chemical fingerprints are conveniently stored in compressed format, although the lossy compression process may introduce a systematic error on similarity measures. Simple correction formulae have proposed by Swamidass and Baldi in [13] to compensate for such an error and, thus, to improve the similarity-based retrieval. Correction is based on deriving estimates for the weight (i.e., number of bits set to 1) of fingerprints before compression from their compressed values. Although the proposed correction has been substantiated by satisfactory experimental results, the way in which such estimates have been derived and the approximations applied in [13] are not fully convincing and, thus, deserve further investigation. In this direction, the contribution of this work is to provide some deeper insight on the fingerprint generation and compression process, which could constitute a more solid theoretical underpinning for the Swamidass and Baldi correction formulae.
机译:化学指纹是二进制字符串,用于表示分子的独特特征,以便有效地支持化学数据的相似性搜索。在大型存储库中,化学指纹方便地存储在压缩格式中,尽管有损压缩过程可能会在相似度措施上引入系统误差。 SWAMIDASS和BALDI提出了简单的校正公式,以补偿这种错误,从而提高基于相似性的检索。校正基于在压缩值的压缩之前导出用于指纹的权重(即,设置为1的比特数为1)的估计。尽管所提出的校正已经通过令人满意的实验结果证实了这种估计的方法,但在[13]中施加的近似是不完全令人信服的,因此值得进一步调查。在这方面,这项工作的贡献是对指纹产生和压缩过程提供一些更深入的洞察力,这可能构成斯瓦姆多斯和BALDI校正公式的更加稳定的理论基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号