【24h】

Similarity Search over Incomplete Symbolic Sequences

机译:不完整符号序列的相似性搜索

获取原文
获取原文并翻译 | 示例

摘要

Reliable measure of similarity between symbolic sequences is an important problem in the fields of database and data mining. A lot of distance functions have been developed for symbolic sequence data in the past years. However, most of them are focused on the distance between complete symbolic sequences while the distance measurement for incomplete symbolic sequences remains unexplored. In this paper, we propose a method to process similarity search over incomplete symbolic sequences. Without any knowledge about the positions and values of the missing elements, it is impossible to get the exact distance between a query sequence and an incomplete sequence. Instead of calculating this exact distance, we map a pair of symbolic sequences to a real-valued interval, I.e, we propose a lower bound and an upper bound of the underlying exact distance between a query sequence and an incomplete sequence. In this case, similarity search can be conducted with guaranteed performance in terms of either recall or precision. The proposed method is also extended to handle with real-valued sequence data. The experimental results on both synthetic and real-world data show that our method is both efficient and effective.
机译:可靠地度量符号序列之间的相似性是数据库和数据挖掘领域中的重要问题。过去几年中,已经为符号序列数据开发了许多距离函数。然而,它们中的大多数集中在完整符号序列之间的距离上,而对于不完整符号序列的距离测量仍未探索。在本文中,我们提出了一种处理不完整符号序列相似性搜索的方法。如果不了解缺失元素的位置和值,就不可能获得查询序列和不完整序列之间的准确距离。代替计算这个精确距离,我们将一对符号序列映射到一个实值区间,即我们提出了查询序列和不完整序列之间的基础精确距离的下限和上限。在这种情况下,就查全率或查准率而言,可以在保证性能的情况下进行相似性搜索。所提出的方法也扩展为处理实值序列数据。综合和真实数据的实验结果表明,我们的方法既有效又有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号