首页> 外文期刊>IEEE transactions on audio, speech and language processing >Joint Morphological-Lexical Language Modeling for Processing Morphologically Rich Languages With Application to Dialectal Arabic
【24h】

Joint Morphological-Lexical Language Modeling for Processing Morphologically Rich Languages With Application to Dialectal Arabic

机译:形态-词汇联合语言建模,用于处理形态丰富的语言及其在方言阿拉伯语中的应用

获取原文
获取原文并翻译 | 示例
       

摘要

Language modeling for an inflected language such as Arabic poses new challenges for speech recognition and machine translation due to its rich morphology. Rich morphology results in large increases in out-of-vocabulary (OOV) rate and poor language model parameter estimation in the absence of large quantities of data. In this study, we present a joint morphological-lexical language model (JMLLM) that takes advantage of Arabic morphology. JMLLM combines morphological segments with the underlying lexical items and additional available information sources with regards to morphological segments and lexical items in a single joint model. Joint representation and modeling of morphological and lexical items reduces the OOV rate and provides smooth probability estimates while keeping the predictive power of whole words. Speech recognition and machine translation experiments in dialectal-Arabic show improvements over word and morpheme based trigram language models. We also show that as the tightness of integration between different information sources increases, both speech recognition and machine translation performances improve.
机译:由于其丰富的形态,诸如阿拉伯语之类的屈折语言的语言建模对语音识别和机器翻译提出了新的挑战。在缺少大量数据的情况下,丰富的形态会导致语音输出(OOV)速率大大增加,而语言模型参数的估算会很差。在这项研究中,我们提出了一种利用阿拉伯语形态学的联合词法语言模型(JMLLM)。 JMLLM在单个联合模型中将形态学片段与基础词汇项目以及有关形态学片段和词汇项目的其他可用信息源组合在一起。词法和词法项的联合表示和建模可降低OOV率,并提供平滑的概率估计,同时保持整个单词的预测能力。方言-阿拉伯语中的语音识别和机器翻译实验表明,与基于单词和词素的Trigram语言模型相比,该方法有所改进。我们还表明,随着不同信息源之间集成程度的提高,语音识别和机器翻译性能均得到改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号