首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Fast Audio Fingerprinting System Using GPU and a Clustering-Based Technique
【24h】

Fast Audio Fingerprinting System Using GPU and a Clustering-Based Technique

机译:使用GPU和基于聚类技术的快速音频指纹识别系统

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we present our audio fingerprinting system that detects a transformed copy of an audio from a large collection of audios in a database. The audio fingerprints in this system encode the positions of salient regions of binary images derived from a spectrogram matrix. The similarity between two fingerprints is defined as the intersection of their elements (i.e. positions of the salient regions). The search algorithm labels each reference fingerprint in the database with the closest query frame and then counts the number of matching frames when the query is overlaid over the reference. The best match is based on this count. The salient regions fingerprints together with this nearest-neighbor search give excellent copy detection results. However, for a large database, this search is time consuming. To reduce the search time, we accelerate this similarity search by using a graphics processing unit (GPU). To speed this search even further, we use a two-step search based on a clustering technique and a lookup table that reduces the number of comparisons between the query and the reference fingerprints. We also explore the tradeoff between the speed of search and the copy detection performance. The resulting system achieves excellent results on TRECVID 2009 and 2010 datasets and outperforms several state-of-the-art audio copy detection systems in detection performance, localization accuracy and run time. For a fast detection scenario with detection speed comparable to the Ellis’ Shazam-based system, our system achieved the same min NDCR as the NN-based system, and significantly better detection accuracy than Ellis’ Shazam-based system.
机译:在本文中,我们介绍了音频指纹识别系统,该系统可以从数据库中的大量音频中检测音频的转换副本。该系统中的音频指纹编码从频谱图矩阵导出的二进制图像的显着区域的位置。两个指纹之间的相似性定义为它们的元素的交集(即显着区域的位置)。搜索算法使用最接近的查询帧标记数据库中的每个参考指纹,然后在查询覆盖参考时对匹配帧的数量进行计数。最佳匹配基于此计数。显着区域指纹与此最近邻居搜索一起提供了出色的复制检测结果。但是,对于大型数据库,此搜索非常耗时。为了减少搜索时间,我们使用图形处理单元(GPU)加快了相似度搜索。为了进一步加快搜索速度,我们使用了基于聚类技术和查找表的两步搜索,查找表减少了查询指纹和参考指纹之间的比较次数。我们还探讨了搜索速度和复制检测性能之间的权衡。最终的系统在TRECVID 2009和2010数据集上取得了出色的结果,并且在检测性能,定位精度和运行时间方面均优于多个最新的音频复制检测系统。对于快速检测场景,其检测速度可与基于Ellis的Shazam系统相媲美,我们的系统实现了与基于NN的系统相同的最小NDCR,并且检测精度明显优于基于Ellis的Shazam的系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号