首页> 外文期刊>Computer speech and language >Contemporaneous text as side-information in statistical language modeling
【24h】

Contemporaneous text as side-information in statistical language modeling

机译:统计语言建模中的同时文本作为辅助信息

获取原文
获取原文并翻译 | 示例

摘要

We propose new methods to exploit contemporaneous text, such as on-line news articles, to improve language models for automatic speech recognition and other natural language processing applications. In particular, we investigate the use of text from a resource-rich language to sharpen language models for processing a news story or article in a language with scarce linguistic resources. We demonstrate that even with fairly crude cross-language information retrieval and simple machine translation, one can construct story-specific Chinese language models which exploit cues from a side-corpus of English newswire to significantly improve the performance of language models estimated from a static Chinese corpus. Our investigations cover cases when the amount of available Chinese text is small, and a case when a large Chinese text corpus is available. We examine the effectiveness of our techniques both when the side-corpus contains English documents that are near-translations of the Chinese documents being processed, and when the English side-corpus is merely from contemporaneous and independent news sources. We present experimental results for automatic transcription of speech from the Mandarin Broadcast News corpus.
机译:我们提出了新的方法来利用实时文本,例如在线新闻文章,以改进用于自动语音识别和其他自然语言处理应用程序的语言模型。特别是,我们调查了使用资源丰富的语言中的文本来增强语言模型的能力,以便使用语言资源稀缺的语言来处理新闻报道或文章。我们证明,即使使用相当粗略的跨语言信息检索和简单的机器翻译,也可以构建特定于故事的中文语言模型,该模型利用英语新闻专线的副语线索来显着提高从静态中文估计的语言模型的性能。语料库。我们的调查涵盖了可用的中文文本量很小的情况,以及可用的大型中文文本语料库的情况。当辅助语料库包含与正在处理的中文文档几乎翻译成英文的英文文档时,以及当英语辅助语料仅来自同期和独立的新闻来源时,我们都会检查技术的有效性。我们介绍了从国语广播新闻语料库自动语音转录的实验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号