首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Overlapping Speech Detection Using Long-Term Conversational Features for Speaker Diarization in Meeting Room Conversations
【24h】

Overlapping Speech Detection Using Long-Term Conversational Features for Speaker Diarization in Meeting Room Conversations

机译:会议室会话中使用长期会话特征进行语音重叠的语音检测重叠

获取原文
获取原文并翻译 | 示例

摘要

Overlapping speech has been identified as one of the main sources of errors in diarization of meeting room conversations. Therefore, overlap detection has become an important step prior to speaker diarization. Studies on conversational analysis have shown that overlapping speech is more likely to occur at specific parts of a conversation. They have also shown that overlap occurrence is correlated with various conversational features such as speech, silence patterns and speaker turn changes. We use features capturing this higher level information from structure of a conversation such as silence and speaker change statistics to improve acoustic feature based classifier of overlapping and single-speaker speech classes. The silence and speaker change statistics are computed over a long-term window (around 3-4 seconds) and are used to predict the probability of overlap in the window. These estimates are then incorporated into a acoustic feature based classifier as prior probabilities of the classes. Experiments conducted on three corpora (AMI, NIST-RT and ICSI) have shown that the proposed method improves the performance of acoustic feature-based overlap detector on all the corpora. They also reveal that the model based on long-term conversational features used to estimate probability of overlap which is learned from AMI corpus generalizes to meetings from other corpora (NIST-RT and ICSI). Moreover, experiments on ICSI corpus reveal that the proposed method also improves laughter overlap detection. Consequently, applying overlap handling techniques to speaker diarization using the detected overlap results in reduction of diarization error rate (DER) on all the three corpora.
机译:语音重叠已经被确定为会议室对话数字化错误的主要来源之一。因此,重叠检测已经成为说话者二值化之前的重要步骤。对会话分析的研究表明,重叠的语音更有可能在会话的特定部分发生。他们还表明,重叠发生与各种对话功能(例如语音,静音模式和说话者转向)相关。我们使用从对话的结构中捕获此较高级别信息的功能(例如沉默和说话者变化统计信息)来改善基于声学特征的重叠和单说话者语音分类器。静默和扬声器更改统计信息是在一个长期窗口(大约3-4秒)内计算的,用于预测窗口内重叠的可能性。然后,将这些估计值作为类别的先验概率合并到基于声学特征的分类器中。对三种语料库(AMI,NIST-RT和ICSI)进行的实验表明,该方法提高了所有语料库中基于声学特征的重叠检测器的性能。他们还发现,从AMI语料库中学到的用于估计重叠概率的基于长期对话功能的模型可以推广到其他语料库(NIST-RT和ICSI)的会议。此外,对ICSI语料库的实验表明,该方法还可以改善笑声重叠检测。因此,将重叠处理技术应用于使用检测到的重叠进行的说话人区分时,会降低所有三个语料库的区分错误率(DER)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号