首页> 外文会议>IEEE International Conference on Acoustics Speech and Signal;ICASSP 2010 >Automatic sentence boundary detection in conversational speech: A cross-lingual evaluation on English and Czech
【24h】

Automatic sentence boundary detection in conversational speech: A cross-lingual evaluation on English and Czech

机译:会话语音中的句子边界自动检测:英语和捷克语的跨语言评估

获取原文

摘要

Automatic sentence segmentation of speech is important for enriching speech recognition output and aiding downstream language processing. This paper focuses on automatic sentence segmentation of speech in two different languages - English and Czech. For this task, we compare and combine three statistical models - HMM, maximum entropy, and a boosting-based model BoosTexter. All these approaches rely on both textual and prosodic information. We evaluate these methods on a corpus of multiparty meetings in English, and on a corpus of broadcast conversations in Czech, using both manual and speech recognition transcripts. The experiments show that superior results are achieved when all the three models are combined via posterior probability interpolation. We observe differences in terms of model performance between English and Czech, as well as the feature usage difference in prosodic models between the two languages. Overall, the analysis is important for porting sentence segmentation approaches from one language to another.
机译:语音的自动句子分割对于丰富语音识别输出和辅助下游语言处理非常重要。本文重点介绍英语和捷克语两种不同语言的语音自动句子分割。对于此任务,我们将比较并合并三个统计模型-HMM,最大熵和基于Boosting的模型BoosTexter。所有这些方法都依赖于文本信息和韵律信息。我们在英语的多方会议语料库以及在捷克语的广播对话语料库中使用手动和语音识别记录本评估这些方法。实验表明,通过后验概率插值将这三个模型全部组合起来,可获得更好的结果。我们观察到英语和捷克语之间在模型性能方面的差异,以及两种语言在韵律模型中的特征用法差异。总体而言,该分析对于将句子分段方法从一种语言移植到另一种语言非常重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号