首页> 外文会议>Annual conference of the International Speech Communication Association;INTERSPEECH 2010 >Topic and style-adapted language modeling for Thai broadcast news ASR
【24h】

Topic and style-adapted language modeling for Thai broadcast news ASR

机译:适用于泰国广播新闻ASR的主题和样式自适应语言建模

获取原文

摘要

The amount of available Thai broadcast news transcribed text for training a language model is still very limited, comparing to other major languages. Since the construction of a broadcast news corpus is very costly and time-consuming, newspaper text is often used to increase the size of training text data. This paper proposes a language model topic and style adaptation approach for a Thai broadcast news ASR system, using broadcast news and newspaper text. A rule-based speaking style classification method based on the existence of some specific words is applied to classify training text. Various kinds of language models adapted to topics and styles are studied and shown to successfully reduce test set perplexity and recognition error rate. The results also show that written style text from newspaper can be employed to alleviate the sparseness of the broadcast news corpus while spoken style text from the broadcast news corpus is still essential for building a reliable language model.
机译:与其他主要语言相比,用于训练语言模型的可用泰国广播新闻转录文本的数量仍然非常有限。由于广播新闻语料库的构建非常昂贵且耗时,因此通常使用报纸文本来增加训练文本数据的大小。本文使用广播新闻和报纸文本,为泰国广播新闻ASR系统提出了一种语言模型主题和样式适应方法。将基于某些特定单词的存在性的基于规则的说话风格分类方法应用于训练文本的分类。研究并展示了适用于主题和样式的各种语言模型,可以成功减少测试集的困惑和识别错误率。结果还表明,可以使用报纸上的书面风格文本来减轻广播新闻语料库的稀疏性,而广播新闻语料库中的口头风格文本对于建立可靠的语言模型仍然至关重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号