首页> 外文会议>International conference on web information systems engineering >Keyword Search over Web Documents Based on Earth Mover's Distance
【24h】

Keyword Search over Web Documents Based on Earth Mover's Distance

机译:基于地球移动者距离的基于Web文档的关键字搜索

获取原文

摘要

Keyword search is widely used in many practical applications. Unfortunately, most keyword-based search engines compute the similarity distance between two Web documents by only matching the keywords at the same positions in both the query and the document vectors, without considering the impact of the keywords at neighbouring positions. Such approach usually results in incompleteness of search results. In this paper, we exploit the Earth Mover's Distance (EMD) as a distance function, which is more flexible against other distance functions such as Euclidean distance. To overcome the limitation of EMD-based computation complexity, we use the filtering techniques to minimize the total number of actual EMD computations. We further develop a novel lower bound as a new EMD filter for partial matching technique that is suitable for searching Web documents. The experimental results demonstrate the efficiency of EMD-based search with filtering techniques.
机译:关键字搜索在许多实际应用中被广泛使用。不幸的是,大多数基于关键字的搜索引擎仅通过匹配查询和文档向量中相同位置的关键字来计算两个Web文档之间的相似距离,而不考虑关键字在相邻位置的影响。这种方法通常导致搜索结果不完整。在本文中,我们将地球移动者的距离(EMD)用作距离函数,它相对于其他距离函数(例如欧几里得距离)具有更大的灵活性。为了克服基于EMD的计算复杂性的局限性,我们使用了过滤技术来最小化实际EMD计算的总数。我们进一步开发了一种新颖的下限作为适用于搜索Web文档的部分匹配技术的新EMD过滤器。实验结果证明了采用过滤技术的基于EMD的搜索的效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号