首页> 外文会议>2011 IEEE Workshop on Automatic Speech Recognition amp; Understanding >Fast speaker diarization using a high-level scripting language
【24h】

Fast speaker diarization using a high-level scripting language

机译:使用高级脚本语言实现快速的说话人区分

获取原文
获取原文并翻译 | 示例

摘要

Most current speaker diarization systems use agglomerative clustering of Gaussian Mixture Models (GMMs) to determine “who spoke when” in an audio recording. While state-of-the-art in accuracy, this method is computationally costly, mostly due to the GMM training, and thus limits the performance of current approaches to be roughly real-time. Increased sizes of current datasets require processing of hundreds of hours of data and thus make more efficient processing methods highly desirable. With the emergence of highly parallel multicore and manycore processors, such as graphics processing units (GPUs), one can re-implement GMM training to achieve faster than real-time performance by taking advantage of parallelism in the training computation. However, developing and maintaining the complex low-level GPU code is difficult and requires a deep understanding of the hardware architecture of the parallel processor. Furthermore, such low-level implementations are not readily reusable in other applications and not portable to other platforms, limiting programmer productivity. In this paper we present a speaker diarization system captured in under 50 lines of Python that achieves 50–250× faster than real-time performance by using a specialization framework to automatically map and execute computationally intensive GMM training on an NVIDIA GPU, without significant loss in accuracy.
机译:当前大多数说话者区分系统使用高斯混合模型(GMM)的聚集聚类来确定音频记录中的“何时说话”。尽管该方法具有最先进的准确性,但在计算上却非常昂贵,这主要是由于进行了GMM训练,因此将当前方法的性能限制为大致实时。当前数据集的大小增加需要处理数百小时的数据,因此迫切需要更有效的处理方法。随着高度并行的多核和多核处理器(例如图形处理单元(GPU))的出现,人们可以通过在训练计算中利用并行性来重新实现GMM训练,以实现比实时性能更快的速度。但是,开发和维护复杂的低级GPU代码很困难,并且需要对并行处理器的硬件体系结构有深入的了解。此外,这样的低级实现在其他应用程序中不易重用,也不可移植到其他平台,从而限制了程序员的工作效率。在本文中,我们介绍了一种在50行以下Python中捕获的说话者区分系统,该系统使用专业化框架在NVIDIA GPU上自动映射并执行计算密集型GMM训练,从而比实时性能快50–250倍,而不会造成重大损失准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号