首页> 外文会议>International Workshop on Machine Learning for Multimodal Interaction >Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005 Speech-to-Text Evaluation System
【24h】

Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005 Speech-to-Text Evaluation System

机译:会议认可的进一步进展:ICSI-SRI Spring 2005演讲到文本评估系统

获取原文

摘要

We describe the development of our speech recognition system for the National Institute of Standards and Technology (NIST) Spring 2005 Meeting Rich Transcription (RT-05S) evaluation, highlighting improvements made since last year [1]. The system is based on the SRI-ICSI-UW RT-04F conversational telephone speech (CTS) recognition system, with meeting-adapted models and various audio preprocessing steps. This year’s system features better delay-sum processing of distant microphone channels and energy-based crosstalk suppression for close-talking microphones. Acoustic modeling is improved by virtue of various enhancements to the background (CTS) models, including added training data, decision-tree based state tying, and the inclusion of discriminatively trained phone posterior features estimated by multilayer perceptrons. In particular, we make use of adaptation of both acoustic models and MLP features to the meeting domain. For distant microphone recognition we obtained considerable gains by combining and cross-adapting narrow-band (telephone) acoustic models with broadband (broadcast news) models. Language models (LMs) were improved with the inclusion of new meeting and web data. In spite of a lack of training data, we created effective LMs for the CHIL lecture domain. Results are reported on RT-04S and RT-05S meeting data. Measured on RT-04S conference data, we achieved an overall improvement of 17% relative in both MDM and IHM conditions compared to last year’s evaluation system. Results on lecture data are comparable to the best reported results for that task.
机译:我们描述了我们为国家标准和技术研究所(NIST)春季2005年富人转录(RT-05S)评估的发展,突出了自去年以来的改进[1]。该系统基于SRI-ICSI-UW RT-04F会话电话语音(CTS)识别系统,具有满足适应的模型和各种音频预处理步骤。今年的系统采用近距离麦克风通道和基于能量的串扰抑制的更好的延迟处理,用于近距离谈话的麦克风。通过对背景(CTS)模型的各种增强,包括添加训练数据,基于决策树的状态捆绑的各种增强,以及包含多层训练的训练的电话后续特征,改善了声学建模。特别是,我们利用声学模型和MLP功能的调整到会议域。对于远处的麦克风识别,我们通过将具有宽带(广播新闻)模型的窄带(电话)声学模型组合和交叉调整窄带(电话)声学模型来获得相当大的增益。在包含新的会议和Web数据的情况下,改进了语言模型(LMS)。尽管缺乏培训数据,但我们为Chil讲座域创建了有效的LMS。结果在RT-04S和RT-05S会议数据上报告。与去年的评估系统相比,在RT-04S会议数据上测量,我们在MDM和IHM条件下实现了17%的总体提高。讲座数据的结果与该任务的最佳报告结果相当。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号