首页> 外文会议>International workshop on finite state methods and natural language processing >Using Meta-Morph Rules to develop Morphological Analysers: A case study concerning Tamil
【24h】

Using Meta-Morph Rules to develop Morphological Analysers: A case study concerning Tamil

机译:使用Meta-Morph规则来发展形态分析仪:一个关于泰米尔的案例研究

获取原文

摘要

This paper describes a new and larger coverage Finite-State Morphological Anal-yser (FSM) and Generator for the Dra-vidian language Tamil. The FSM has been developed in the context of com-putational grammar engineering, adhering to the standards of the ParGram effort. Tamil is a morphologically rich language and the interaction between linguistic anal-ysis and formal implementation is complex, resulting in a challenging task. In order to allow the development of the FSM to fo-cus more on the linguistic analysis and less on the formal details, we have developed a system of meta-morph(ology) rules along with a script which translates these rules into FSM processable representations. The introduction of meta-morph rules makes it possible for computationally naive lin-guists to interact with the system and to expand it in future work. We found that the meta-morph rules help to express lin-guistic generalisations and reduce the man-ual effort of writing lexical classes for mor-phological analysis. Our Tamil FSM cur-rently handles mainly the inflectional mor-phology of 3,300 verb roots and their 260 forms. Further, it also has a lexicon of approximately 100,000 nouns along with a guesser to handle out-of-vocabulary items. Although the Tamil FSM was primarily developed to be part of a computational grammar, it can also be used as a web or stand-alone application for other NLP tasks, as per general ParGram practice.
机译:本文介绍了DRA-Vidian语言泰米尔的新覆盖率有限状态的有限状态肛门YSER(FSM)和发电机。 FSM已经在COM-COLACT语法工程的背景下制定,秉承普拉法尔努力的标准。泰米尔是一种形态学丰富的语言,语言分析与正式实施之间的互动是复杂的,导致了一个具有挑战性的任务。为了让FSM的开发更加关于语言分析和更少的正式细节,我们开发了一个Meta-Morph(oggogy)规则的系统以及将这些规则转化为FSM Processable表示的脚本。 Meta-Morph规则的引入使得计算天真的Lin-Guists可以与系统进行交互并在将来的工作中扩展它。我们发现,Meta-Morph规则有助于表达Lin-Guist概括并降低为Mor-Phological分析写入词汇课程的人 - UAL努力。我们的泰米尔FSM CURLEND PRETELLY主要处理3,300个动词根的折射MOR-PHOTS及其260形式。此外,它还具有大约100,000名名词的词典,以及猜测器来处理词汇外项目。虽然泰米尔FSM主要被开发为成为计算语法的一部分,但由于一般的PargarM练习,它也可以用作其他NLP任务的网络或独立申请。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号