首页> 外文会议>International conference on future data and security engineering >An Efficient Document Indexing-Based Similarity Search in Large Datasets
【24h】

An Efficient Document Indexing-Based Similarity Search in Large Datasets

机译:大型数据集中基于文档索引的有效搜索

获取原文
获取外文期刊封面目录资料

摘要

In this paper, we principally devote our effort to proposing a novel MapReduce-based approach for efficient similarity search in big data. Specifically, we address the drawbacks of using inverted index in similarity search with MapReduce and then propose a simple yet efficient redundancy-free MapRe-duce scheme, which not only takes advantages over the baseline inverted index-based procedures but also adapts to various similarity measures and similarity searches. Additionally, we present other strategic methods in order to potentially contribute to eliminating unnecessary data and computations. Last but not least, empirical evaluations are intensively conducted with real massive datasets and Hadoop framework in the cluster of commodity machines to verify the proposed methods, whose promising results show how much beneficial they are when dealing with big data.
机译:在本文中,我们主要致力于提出一种新颖的基于MapReduce的方法来进行大数据的有效相似性搜索。具体来说,我们解决了在MapReduce相似性搜索中使用倒排索引的弊端,然后提出了一个简单而有效的无冗余MapRe-duce方案,该方案不仅比基于基线的基于倒排索引的过程更具优势,而且还可以适应各种相似性度量和相似性搜索。此外,我们提出了其他战略方法,以潜在地有助于消除不必要的数据和计算。最后但并非最不重要的一点是,使用商品计算机集群中的真实海量数据集和Hadoop框架进行了密集的实证评估,以验证所提出的方法,其有希望的结果表明它们在处理大数据时有多大的益处。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号