...
首页> 外文期刊>Knowledge and Information Systems >On classification and segmentation of massive audio data streams
【24h】

On classification and segmentation of massive audio data streams

机译:关于海量音频数据流的分类和分段

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In recent years, the proliferation of VOIP data has created a number of applications in which it is desirable to perform quick online classification and recognition of massive voice streams. Typically such applications are encountered in real time intelligence and surveillance. In many cases, the data streams can be in compressed format, and the rate of data processing can often run at the rate of Gigabits per second. All known techniques for speaker voice analysis require the use of an offline training phase in which the system is trained with known segments of speech. The state-of-the-art method for text-independent speaker recognition is known as Gaussian mixture modeling (GMM), and it requires an iterative expectation maximization procedure for training, which cannot be implemented in real time. In many real applications (such as surveillance) it is desirable to perform the recognition process in online time, so that the system can be quickly adapted to new segments of the data. In many cases, it may also be desirable to quickly create databases of training profiles for speakers of interest. In this paper, we discuss the details of such an online voice recognition system. For this purpose, we use our micro-clustering algorithms to design concise signatures of the target speakers. One of the surprising and insightful observations from our experiences with such a system is that while it was originally designed only for efficiency, we later discovered that it was also more accurate than the widely used GMM. This was because of the conciseness of the micro-cluster model, which made it less prone to over training. This is evidence of the fact that it is often possible to get the best of both worlds and do better than complex models both from an efficiency and accuracy perspective. We present experimental results illustrating the effectiveness and efficiency of the method.
机译:近年来,VOIP数据的激增创建了许多应用程序,其中希望对大量语音流进行快速的在线分类和识别。通常,在实时情报和监视中会遇到此类应用程序。在许多情况下,数据流可以采用压缩格式,并且数据处理的速率通常可以每秒千兆位的速率运行。用于扬声器语音分析的所有已知技术都需要使用离线训练阶段,在该阶段中,系统将使用已知的语音片段进行训练。与文本无关的说话人识别的最新方法被称为高斯混合建模(GMM),它需要迭代的期望最大化训练过程,该过程无法实时实现。在许多实际应用中(例如监视),希望以在线时间执行识别过程,以使系统可以快速适应新的数据段。在许多情况下,也可能需要为感兴趣的说话者快速创建培训资料数据库。在本文中,我们讨论了这种在线语音识别系统的细节。为此,我们使用微簇算法来设计目标说话者的简洁签名。从我们使用这种系统的经验中得出的令人惊讶和深刻的发现之一是,尽管它最初只是为提高效率而设计的,但后来我们发现它也比广泛使用的GMM更准确。这是因为微集群模型的简洁性,使其不太容易过度训练。这是事实的证明,从效率和准确性的角度来看,通常都可以兼顾两者,并且比复杂的模型做得更好。我们目前的实验结果说明了该方法的有效性和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号