首页> 外国专利> Deduplication using nearest neighbor cluster

Deduplication using nearest neighbor cluster

机译：使用最近邻群的重复数据删除

页面导航

摘要
著录项
相似文献

摘要

Disclosed are techniques for data deduplication, which include methods, systems, or computer products for reducing data redundancy in a data storage system comprising searching a cluster of nearest neighbors, wherein the cluster has been created using a locality sensitive hashing algorithm, to determine if a data block has been stored in the data storage system prior to writing the data block. In alternate embodiments, the nearest neighbor clusters could be created using one or more of the following algorithms: k-means clustering algorithm, a k-medoids clustering algorithm, a mean shift algorithm, a generalized method of moment (GMM) algorithm, or a density based spatial clustering of applications with noise (DBSCAN) algorithm.

机译：公开了用于数据重复数据删除的技术，包括用于降低数据存储系统中的数据冗余的方法，系统或计算机产品，该数据存储系统包括搜索最近邻居的群集，其中已经使用局部敏感散列算法创建了群集，以确定是否是一个数据块在编写数据块之前已存储在数据存储系统中。在备选实施例中，可以使用以下算法中的一个或多个来创建最近的邻居群集：K-means聚类算法，k-meatodels聚类算法，平均移位算法，瞬见的时刻（GMM）算法，或者基于噪声（DBSCAN）算法应用的基于密度的空间聚类。

著录项

公开/公告号US11029871B2

专利类型
公开/公告日2021-06-08

原文格式PDF
申请/专利权人 EMC IP HOLDING COMPANY LLC;
展开▼

申请/专利号US201916412946
发明设计人 JONATHAN KRASNER;SWEETESH SINGH;STEVEN CHALMER;
展开▼

申请日2019-05-15
分类号G06F3/06;G06N20;H04L9/06;
国家 US
入库时间 2022-08-24 19:05:32

相似文献

专利
外文文献
中文文献