首页>
外国专利>
Deduplication using nearest neighbor cluster
Deduplication using nearest neighbor cluster
展开▼
机译:使用最近邻群的重复数据删除
展开▼
页面导航
摘要
著录项
相似文献
摘要
Disclosed are techniques for data deduplication, which include methods, systems, or computer products for reducing data redundancy in a data storage system comprising searching a cluster of nearest neighbors, wherein the cluster has been created using a locality sensitive hashing algorithm, to determine if a data block has been stored in the data storage system prior to writing the data block. In alternate embodiments, the nearest neighbor clusters could be created using one or more of the following algorithms: k-means clustering algorithm, a k-medoids clustering algorithm, a mean shift algorithm, a generalized method of moment (GMM) algorithm, or a density based spatial clustering of applications with noise (DBSCAN) algorithm.
展开▼