首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing;ICASSP >Investigations on the use of morpheme level features in Language Models for Arabic LVCSR
【24h】

Investigations on the use of morpheme level features in Language Models for Arabic LVCSR

机译:在阿拉伯语LVCSR语言模型中使用词素水平特征的研究

获取原文
获取原文并翻译 | 示例

摘要

A major challenge for Arabic Large Vocabulary Continuous Speech Recognition (LVCSR) is the rich morphology of Arabic, which leads to high Out-of-vocabulary (OOV) rates, and poor Language Model (LM) probabilities. In such cases, the use of morphemes rather than full-words is considered a better choice for LMs. Thereby, higher lexical coverage and less LM perplexities are achieved. On the other side, an effective way to increase the robustness of LMs is to incorporate features of words into LMs. In this paper, we investigate the use of features derived for morphemes rather than words. Thus, we combine the benefits of both morpheme level and feature rich modeling. We compare the performance of stream-based, class-based and Factored LMs (FLMs) estimated over sequences of morphemes and their features for performing Arabic LVCSR. A relative reduction of 3.9% in Word Error Rate (WER) is achieved compared to a word-based system.
机译:阿拉伯语大词汇量连续语音识别(LVCSR)的主要挑战是阿拉伯语的形态丰富,这会导致高词汇率(OOV)和较差的语言模型(LM)概率。在这种情况下,使用词素而不是全词被认为是LM的更好选择。因此,获得了更高的词汇覆盖率和更少的LM困惑。另一方面,提高LM鲁棒性的有效方法是将单词的特征合并到LM中。在本文中,我们研究了从词素而不是单词衍生的特征的使用。因此,我们结合了语素水平和功能丰富的建模的优点。我们比较了基于词素序列及其基于特征执行阿拉伯语LVCSR估计的基于流,基于类和因式LM(FLM)的性能。与基于单词的系统相比,单词错误率(WER)相对降低了3.9%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号