首页> 外国专利> A HEURISTIC METHOD AND SYSTEM TO OPTIMIZE DATA STORAGE AND PROBABILISTIC SIMILARITY APPROACH IN DATA DEDUPLICATION.

A HEURISTIC METHOD AND SYSTEM TO OPTIMIZE DATA STORAGE AND PROBABILISTIC SIMILARITY APPROACH IN DATA DEDUPLICATION.

机译：在数据去重复中优化数据存储和概率相似性方法的启发式方法和系统。

页面导航

摘要
著录项
相似文献

摘要

The present invention provides a method for resolving a major issue of space availability in the data storage. The solution prevents duplicate data from being stored and archived. A technique for optimizing the storage includes content and application aware specific data storage. A clustered storage interface is considered for balancing data scalability with minimum metadata communication overhead over the repository and handle resource failure with the proactive measures. The repository is covered across the pool members in a storage pool to detect similar files and achieve high deduplication rate. The solution comprises deterministic and unsupervised probabilistic duplicate detection model using similarity distance metric for resemblance and uniform file distribution avoiding the data skewness. Traffic hits to storage pool; where the pool is associated with multiple pool members and the file is sent to respective pool member based on the high probability of similarity score and availability of disk space. In addition, the queuing system is developed for ongoing backup instance over the failover. It benefits in terms of avoiding the data loss. It shows significant improvement in freeing the storage space with less communication and processing overhead.

机译：本发明提供了一种解决数据存储中空间可用性的主要问题的方法。该解决方案可防止重复数据的存储和存档。用于优化存储的技术包括内容和应用程序感知的特定数据存储。考虑使用集群存储接口来平衡数据可伸缩性和存储库上的最小元数据通信开销，并通过主动措施来处理资源故障。存储库跨存储池中的池成员覆盖，以检测相似文件并实现高重复数据删除率。该解决方案包括确定性和无监督概率重复检测模型，该模型使用相似性距离度量来实现相似性和统一文件分发，从而避免数据偏斜。流量冲击存储池;其中，该池与多个池成员相关联，并且根据相似度评分和磁盘空间的高可能性将文件发送到相应的池成员。此外，还为故障转移上的正在进行的备份实例开发了排队系统。它在避免数据丢失方面有好处。它在释放存储空间方面显示出显着的改进，同时减少了通信和处理开销。

著录项

公开/公告号IN201721027920A

专利类型
公开/公告日2017-09-08

原文格式PDF
申请/专利权人
展开▼

申请/专利号IN201721027920
发明设计人 JYOTI J MALHOTRA;JAGDISH W BAKAL;
展开▼

申请日2017-08-05
分类号G06F17/00;
国家 IN
入库时间 2022-08-21 13:38:17

相似文献

专利
外文文献
中文文献