首页> 外文会议> >Dictation of multiparty conversation using statistical turn taking model and speaker model
【24h】

Dictation of multiparty conversation using statistical turn taking model and speaker model

机译:使用统计转向模型和说话者模型对多方对话进行听写

获取原文

摘要

A new speech decoder dealing with multiparty conversation is proposed. Multiparty conversation denotes a situation in which many speakers talk to each other. Almost of all conventional speech recognition systems assume that the input data consist of single speaker's voice. However, some applications, such as dialogue dictation and voice interfaces for multi-users, have to deal with mixed speakers' voices. In such a situation, the system has to recognize not only the word sequence of the input speech but also the speaker of each part of them. Therefore, we propose a decoder utilizing not only an acoustic model and language model, which are the resources of a conventional single-user speech decoder, but also a statistic turn taking model and speakers models to recognize speech. This framework realizes simultaneous maximum likelihood estimation of spoken word sequence and the speaker sequence. Experimental results using a TV sports news show that the proposed method reduce the word error rate by 7.7% and speaker error rate by 97.8% compared to the conventional method.
机译:提出了一种新的处理多方对话的语音解码器。多方对话表示许多发言人互相交谈的情况。几乎所有传统的语音识别系统都假设输入数据包含单个讲话者的语音。但是,某些应用程序(例如多用户的对话听写和语音界面)必须处理混合说话者的语音。在这种情况下,系统不仅必须识别输入语音的单词序列,而且还必须识别它们每个部分的说话者。因此,我们提出一种解码器,该解码器不仅利用声学模型和语言模型(这是常规单用户语音解码器的资源),而且还利用统计转向模型和说话者模型来识别语音。该框架实现了语音单词序列和说话者序列的同时最大似然估计。电视体育新闻的实验结果表明,与传统方法相比,该方法可将单词错误率降低7.7%,将说话者错误率降低97.8%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号