首页> 外文学位 >Orthographic enrichment for Arabic grammatical analysis.
【24h】

Orthographic enrichment for Arabic grammatical analysis.

机译:拼写法进行阿拉伯语语法分析。

获取原文
获取原文并翻译 | 示例

摘要

The Arabic orthography is problematic in two ways: (1) it lacks the short vowels, and this leads to ambiguity as the same orthographic form can be pronounced in many different ways each of which can have its own grammatical category, and (2) the Arabic word may contain several units like pronouns, conjunctions, articles and prepositions without an intervening white space. These two problems lead to difficulties in the automatic processing of Arabic. The thesis proposes a pre-processing scheme that applies word segmentation and word vocalization for the purpose of grammatical analysis: part of speech tagging and parsing. The thesis examines the impact of human-produced vocalization and segmentation on the grammatical analysis of Arabic, then applies a pipeline of automatic vocalization and segmentation for the purpose of Arabic part of speech tagging. The pipeline is then used, along with the POS tags produced, for the purpose of dependency parsing, which produces grammatical relations between the words in a sentence. The study uses the memory-based algorithm for vocalization, segmentation, and part of speech tagging, and the natural language parser MaltParser for dependency parsing. The thesis represents the first approach to the processing of real-world Arabic, and has found that through the correct choice of features and algorithms, the need for pre-processing for grammatical analysis can be minimized.
机译:阿拉伯语拼字法在两个方面存在问题:(1)它缺少短元音,这导致模棱两可,因为相同的拼字法形式可以以许多不同的方式发音,并且每种拼字法都有其自己的语法类别,并且(2)阿拉伯语单词可能包含代词,连词,冠词和介词等几个单位,中间没有空格。这两个问题导致阿拉伯语自动处理的困难。本文提出了一种预处理方案,该方案将分词和发声应用于语法分析的目的:语音标记和解析的一部分。本文研究了人为发声和切分对阿拉伯语语法分析的影响,然后应用自动发声和切分流水线以实现阿拉伯语语音标记。然后将流水线与产生的POS标签一起用于依赖关系解析的目的,这会在句子中的单词之间产生语法关系。该研究使用基于内存的算法进行发声,分割和部分语音标记,并使用自然语言解析器MaltParser进行依赖关系解析。这篇论文代表了处理现实世界中阿拉伯语的第一种方法,并且发现通过正确选择功能和算法,可以将语法分析预处理的需求降到最低。

著录项

  • 作者

    Mohamed, Emad.;

  • 作者单位

    Indiana University.;

  • 授予单位 Indiana University.;
  • 学科 Language Linguistics.;Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 253 p.
  • 总页数 253
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号