Locality Sensitive Hashing based Incremental Clustering for Creating Affinity Groups in Hadoop - HDFS -An Infrastructure Extension

机译：基于局部敏感的散列增量聚类，用于在Hadoop - HDFS中创建关联组 - anix基础架构扩展

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Apache's Hadoop is an open source framework for large scale data analysis and storage. It is an open source implementation of Google's Map/Reduce framework. It enables distributed, data intensive and parallel applications by decomposing a massive job into smaller tasks and a massive data set into smaller partitions such that each,task processes a different partition in parallel. Hadoop uses Hadoop distributed File System (HDFS) which is an open source implementation of the Google File System (GFS) for storing data. Map/Reduce application mainly uses HDFS for storing data. HDFS is a very large distributed file system that assumes commodity hardware and provides high throughput and fault tolerance. HDFS stores files as a series of blocks and are replicated for fault tolerance. The default, block placement strategy doesn't consider the data characteristics and places the data blocks randomly. Customized strategies can improve the performance of HDFS to a great extend. Applications using HDFS require streaming access to the files and if the related files are placed in the same set of data nodes, the performance can be increased. This paper is discussing about a method for clustering streaming data to the same set of data nodes using the technique of Locality Sensitive Hashing. The method utilizes the compact bitwise representation of.document vectors called fingerprints created using the concept of Locality Sensitive Hashing to increase the data processing speed and performance. The process will be done without affecting the default fault tolerant properties of Hadoop and requires only minimal changes to the Hadoop framework.

机译：Apache的Hadoop是一个用于大规模数据分析和存储的开源框架。它是谷歌地图/减少框架的开源实现。它通过将大规模作业分解成较小的任务和将大规模数据集成到较小的分区中，使其能够分布，数据密集和并行应用程序，使得每个任务并行处理不同的分区。 Hadoop使用Hadoop分布式文件系统（HDF），它是用于存储数据的Google文件系统（GFS）的开源实现。地图/减少应用程序主要使用HDFS来存储数据。 HDFS是一个非常大的分布式文件系统，假设商品硬件并提供高吞吐量和容错。 HDFS将文件存储为一系列块，并复制以进行容错。默认情况下，块放置策略不考虑数据特征并随机地放置数据块。定制策略可以提高HDFS对大延伸的性能。使用HDFS的应用程序需要流传输对文件的访问，并且如果相关文件放置在同一组数据节点中，则可以增加性能。本文正在讨论使用局部敏感散列技术将流数据集聚到相同的数据节点集的方法。该方法利用了使用当地敏感散列概念创建的Document矢量的紧凑位表示，以增加数据处理速度和性能。将完成该过程而不会影响Hadoop的默认容错属性，并且只需要对Hadoop框架的最小变化。

著录项

来源
《International Conference on Circuits, Power and Computing Technologies》|2013年||共7页
会议地点
作者
Kala Karun. A; Chitharanjan. K;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TM13-53;
关键词
Hadoop; HDFS; Fingerprint; Locality Sensitive Hashing;

机译：Hadoop;HDFS;指纹;地区敏感散列;

相似文献

外文文献
中文文献
专利

1. An adaptive mean shift clustering algorithm based on locality-sensitive hashing [J] . Zhang X., Cui Y., Li D., Optik: Zeitschrift fur Licht- und Elektronenoptik: = Journal for Light-and Electronoptic . 2012,第20期

机译：基于局部敏感哈希的自适应均值漂移聚类算法
2. An incremental community detection method for social tagging systems using locality-sensitive hashing [J] . Zhenyu Wu, Ming Zou Neural Networks: The Official Journal of the International Neural Network Society . 2014,第Null期

机译：使用位置敏感哈希的社交标签系统增量社区检测方法
3. An incremental community detection method for social tagging systems using locality-sensitive hashing [J] . Zhenyu Wu, Ming Zou Neural Networks: The Official Journal of the International Neural Network Society . 2014,第Null期

机译：使用位置敏感散列的社交标记系统增量社区检测方法
4. Locality Sensitive Hashing based incremental clustering for creating affinity groups in Hadoop — HDFS - An infrastructure extension [C] . Kala Karun A., Chitharanjan K. 2013 International Conference on Circuits, Power and Computing Technologies . 2013

机译：基于局部敏感哈希的增量集群，用于在Hadoop中创建相似性组— HDFS-基础结构扩展
5. Fast Locality Sensitive Hashing Algorithm for Approximate Nearest Neighbor Search: A Practical Data Mining Approach. [D] . Buaba, Ruben. 2012

机译：近似最近邻居搜索的快速局部敏感哈希算法：一种实用的数据挖掘方法。
6. 16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing [O] . Zeehasham Rasheed, Huzefa Rangwala, Daniel Barbará 2013

机译：16S rRNA元基因组聚类和多样性估计使用局部敏感哈希
7. Locality Sensitive Hashing Based Clustering [O] . Xin Jin, Jiawei Han 2017

机译：基于局部敏感的群体

Locality Sensitive Hashing based Incremental Clustering for Creating Affinity Groups in Hadoop - HDFS -An Infrastructure Extension

摘要

著录项

相似文献

相关主题

期刊订阅