首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Prosodic and other Long-Term Features for Speaker Diarization
【24h】

Prosodic and other Long-Term Features for Speaker Diarization

机译:韵律和其他长期特征,可实现说话人区分

获取原文
获取原文并翻译 | 示例
       

摘要

Speaker diarization is defined as the task of determining ldquowho spoke whenrdquo given an audio track and no other prior knowledge of any kind. The following article shows how a state-of-the-art speaker diarization system can be improved by combining traditional short-term features (MFCCs) with prosodic and other long-term features. First, we present a framework to study the speaker discriminability of 70 different long-term features. Then, we show how the top-ranked long-term features can be combined with short-term features to increase the accuracy of speaker diarization. The results were measured on standardized datasets (NIST RT) and show a consistent improvement of about 30% relative in diarization error rate compared to the best system presented at the NIST evaluation in 2007.
机译:说话者区分被定义为确定在给定音轨且没有任何其他先验知识的情况下讲话的人的任务。下一篇文章显示了如何通过将传统的短期特征(MFCC)与韵律和其他长期特征相结合来改进最新的扬声器二分系统。首先,我们提出一个框架来研究70种不同长期特征的说话人辨别力。然后,我们展示了如何将排名靠前的长期特征与短期特征结合起来以提高说话者区分的准确性。结果是在标准化数据集(NIST RT)上进行测量的,与2007年NIST评估中提出的最佳系统相比,显示出相对误差的稳定提高了约30%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号