An Efficient Similarity Search in Large Data Collections with MapReduce

机译：使用MapReduce在大型数据集中进行有效的相似性搜索

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The era of big data has been calling for many innovations on improving similarity search computing. Such unstoppable large amounts of data threaten both processing capacity and performance of existing information systems. Joining the challenges on scalability, we propose an efficient similarity search in large data collections with MapReduce. In addition, we make the best use of the proposed scheme for widespread similarity search cases including pairwise similarity, search by example, range query, and k-Nearest Neighbor query. Moreover, collaborative strategic refinements are utilized to effectively eliminate unnecessary computations and efficiently speed up the whole process. Last but not least, our methods are enhanced by experiments, along with a previous work, on real large datasets, which shows how well these methods are verified.

机译：大数据时代一直要求在改进相似性搜索计算方面进行许多创新。如此不可阻挡的大量数据威胁着现有信息系统的处理能力和性能。为了应对可伸缩性方面的挑战，我们建议使用MapReduce在大数据集合中进行有效的相似性搜索。此外，我们在广泛的相似性搜索案例中充分利用了所提出的方案，包括成对相似性，示例搜索，范围查询和k最近邻查询。此外，利用协作战略改进来有效地消除不必要的计算并有效地加快整个过程。最后但并非最不重要的一点是，通过对实际大型数据集的实验以及先前的工作对我们的方法进行了改进，这表明了这些方法的验证程度。

著录项

来源
《International conference on future data and security engineering》|2014年|44-57|共14页
会议地点
作者
Trong Nhan Phan; Josef Kueng; Tran Khanh Dang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Similarity search; large datasets; MapReduce; Cosine; Hadoop;

机译：相似度搜索;大型数据集; MapReduce;余弦Hadoop的;

相似文献

外文文献
中文文献
专利

1. MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees [J] . Suzanne J Matthews, Tiffani L Williams BMC Bioinformatics . 2010,第SUPPLEMENTa1期

机译：MrsRF：一种有效的MapReduce算法，用于分析进化树的大量集合
2. Live data migration approach from relational tables to schema-free collections with MapReduce [J] . Kun Ma, Fusen Dong International Journal of Services Technology and Management . 2015,第4a6期

机译：使用MapReduce从关系表到无模式集合的实时数据迁移方法
3. Selective Search: Efficient and Effective Search of Large Textual Collections [J] . Kulkarni Anagha, Callan Jamie ACM Transactions on Information Systems . 2015,第4期

机译：选择性搜索：大型文本集的高效搜索
4. An Efficient Similarity Search in Large Data Collections with MapReduce [C] . Trong Nhan Phan, Josef Kung, Tran Khanh Dang International Conference on Future Data and Security Engineering . 2014

机译：MapReduce的大数据集中有效的相似性搜索
5. ACE: Agile, Contingent and Efficient Similarity Joins Using MapReduce [D] . Lakshminarayanan, Mahalakshmi. 2013

机译：ACE：使用MapReduce的敏捷，偶然和有效相似性联接
6. MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees [O] . Suzanne J Matthews, Tiffani L Williams 2010

机译：MrsRF：一种有效的MapReduce算法用于分析进化树的大量集合
7. Pairwise Document Similarity in Large Collections with MapReduce [O] . Tamer Elsayed, Jimmy Lin, Douglas W. Oard 2009

机译：使用MapReduce的大型集合中的成对文档相似性
8. Similarity Search in Large Collections of Biometric Data. [R] . Zezula, P., Batko, M., Dohnal, V., 2009

机译：生物特征数据大集合中的相似性搜索。

An Efficient Similarity Search in Large Data Collections with MapReduce

摘要

著录项

相似文献

相关主题

期刊订阅