The present invention provides a method for resolving a major issue of space availability in the data storage. The solution prevents duplicate data from being stored and archived. A technique for optimizing the storage includes content and application aware specific data storage. A clustered storage interface is considered for balancing data scalability with minimum metadata communication overhead over the repository and handle resource failure with the proactive measures. The repository is covered across the pool members in a storage pool to detect similar files and achieve high deduplication rate. The solution comprises deterministic and unsupervised probabilistic duplicate detection model using similarity distance metric for resemblance and uniform file distribution avoiding the data skewness. Traffic hits to storage pool; where the pool is associated with multiple pool members and the file is sent to respective pool member based on the high probability of similarity score and availability of disk space. In addition, the queuing system is developed for ongoing backup instance over the failover. It benefits in terms of avoiding the data loss. It shows significant improvement in freeing the storage space with less communication and processing overhead.
展开▼