【24h】

Affix-augmented stem-based language model for persian

机译:波斯语以词缀为基础的词缀增强词干模型

获取原文

摘要

Language modeling is used in many NLP applications like machine translation, POS tagging, speech recognition and information retrieval. It assigns a probability to a sequence of words. This task becomes a challenging problem for high inflectional languages. In this paper we investigate standard statistical language models on the Persian as an inflectional language. We propose two variations of morphological language models that rely on a morphological analyzer to manipulate the dataset before modeling. Then we discuss shortcoming of these models, and introduce a novel approach that exploits the structure of the language and produces more accurate. Experimental results are encouraging especially when we use n-gram models with small training dataset.
机译:语言建模已在许多NLP应用程序中使用,例如机器翻译,POS标记,语音识别和信息检索。它将概率分配给单词序列。对于高屈折度的语言,此任务成为具有挑战性的问题。在本文中,我们研究了波斯语作为一种屈折语言的标准统计语言模型。我们提出了两种形态语言模型的变体,它们依赖于形态分析器在建模之前操纵数据集。然后,我们讨论这些模型的缺点,并介绍一种利用语言结构并产生更准确结果的新颖方法。实验结果令人鼓舞,尤其是当我们将n-gram模型与较小的训练数据集一起使用时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号