首页> 中文期刊>计算机应用研究 >一种采用声学指纹去重的海量MP3文件存储架构

一种采用声学指纹去重的海量MP3文件存储架构

     

摘要

Due to the Hadoop itself is not suitable for processing of the mass of small files. And current data de-duplication methods are mainly based on the binary characteristics of the file, so it cannot recognize the same song after the signal processing and also cannot meet the requirements of the online processing of massive data. This paper presented a de-duplication storage architecture of the mass of the MP3 file based on the acoustic fingerprint. It combined with music files on the acoustic characteristics and the meta-information of MR files, de-duplication by index, merge online and NAF, solved the memory bottleneck problem effectively in the face of too many small files. At the same time it provided a better de-duplication effect. Offline merge and the replication place module optimized storage continually according to the operating conditions of the system. The experimental results show that the architecture can achieve a good balance on performance, the rate of de-duplication, manageability and scalability.%由于Hadoop自身并不适合海量小文件处理,目前的重复数据删除方法主要基于文件的二进制特征,无法识别经过信号处理后的同一首歌曲,也不能满足海量数据在线处理的要求.提出一种采用声学指纹去重的海量MP3文件存储架构,结合音乐文件自身的声学特性和MP3文件包含的元信息,通过索引、在线归并和NAF去重,很好地解决了小文件过多时内存瓶颈问题,同时提供了更好的去重效果;离线归并和副本调整模块根据系统的运行状况不断优化存储.实验结果表明,该架构在性能、去重率、可管理性和可扩展性方面达到了良好的平衡,极大地提高了去重率,与可变分块CDC相比,去重率提高了100%,具有良好的实用价值.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号