首页> 外文期刊>ACM transactions on Asian language information processing >Exploiting Separation of Closed-Class Categories for Arabic Tokenization and Part-of-Speech Tagging
【24h】

Exploiting Separation of Closed-Class Categories for Arabic Tokenization and Part-of-Speech Tagging

机译:利用封闭类别的分离进行阿拉伯语标记和词性标记

获取原文
获取原文并翻译 | 示例

摘要

Research on the problem of morphological disambiguation of Arabic has noted that techniques developed for lexical disambiguation in English do not easily transfer over, since the affixation present in Arabic creates a very different tag set than for English, encoding both inflectional morphology and more complex tokenization sequences. This work takes a new approach to this problem based on a distinction between the open-class and closed-class categories of tokens, which differ both in their frequencies and in their possible morphological affixations. This separation simplifies the morphological analysis problem considerably, making it possible to use a Conditional Random Field model for joint tokenization and "core" part-of-speech tagging of the open-class items, while the closed-class items are handled by regular expressions. This work is therefore situated between data-driven approaches and those that use a morphological analyzer. For the tasks of tokenization and core part-of-speech tagging, the resulting system outperforms, on the given test set, a system that incorporates a morphological analyzer. We also evaluate the effects of the differences on parser performance when the tagger output is used for parser input.
机译:有关阿拉伯语形态歧义消除问题的研究表明,为英语词汇歧义消除而开发的技术不易转移,因为存在于阿拉伯语中的词缀创建的标记集与英语产生的标记集非常不同,既编码了变形词法又编码了更复杂的标记化序列。这项工作基于令牌的开放类和封闭类类别之间的区别,采用了一种新的方法来解决这个问题,令牌的频率和可能的词缀形式都不同。这种分离大大简化了形态分析问题,使得可以使用条件随机场模型对开放类项目进行联合标记和“核心”词性标记,而封闭类项目则由正则表达式处理。因此,这项工作位于数据驱动方法和使用形态分析仪的方法之间。对于标记化和核心词性标记的任务,在给定的测试集上,生成的系统的性能优于结合了形态分析器的系统。当标记器输出用于解析器输入时,我们还将评估差异对解析器性能的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号