首页> 外文会议>International Conference on Information Technology Research >ThamizhiFST: A Morphological Analyser and Generator for Tamil Verbs
【24h】

ThamizhiFST: A Morphological Analyser and Generator for Tamil Verbs

机译:Thamizhifst:泰米尔动词的形态分析仪和发电机

获取原文

摘要

ThamizhiFST is a Morphological Analyser and Generator (MAG) for Tamil. It was developed to extend the coverage of the computational Tamil grammar being developed using Lexical Functional Grammar (LFG). ThamizhiFST covers the simple verbs in Tamil as an initial step. A Finite State Transducer (FST) approach was used to develop the MAG and it was implemented using the FOMA Open Source Software. Since morphological rules are of a finite nature and represent a known quantity, a rule-based approach like FST is more appropriate than possible machine learning alternatives, especially with respect to achieving reliably good accuracy that is required for computational grammar development. A set of 3250 Tamil verb lemmas from 13 paradigms together with their 260 conjugation forms were used in the construction of ThamizhiFST. Further, a set of 27 labels were used to mark the morphosyntactic information of the verbs. The whole system was developed as a three-layer web-based system to tackle the issues arising when processing an agglutinative language like Tamil and to ensure its extendability. Unlike other existing MAGs, ThamizhiFST also provides the morpheme corresponding to each morphosyntactic label and marks morpheme boundaries. An evaluation shows that ThamizhiFST has an f-measure of 0.97 for simple verbs. Future and current work include work on extending the system to cover more verbs and nouns and make it generally available.
机译:Thamizhifst是泰米尔的形态分析仪和发电机(Mag)。它开发了扩展使用词汇功能语法(LFG)开发的计算泰米尔语法的覆盖范围。 Thamizhifst涵盖了泰米尔中的简单动词作为初步步骤。使用有限状态传感器(FST)方法来开发MAG,并使用FOMA开源软件实现。由于形态学规则具有有限性并且代表已知的数量,因此基于规则的方法,如FST比可能的机器学习替代更合适,特别是关于实现计算语法开发所需的可靠性良好的精度。在Thamizhifst的建设中使用了来自13个范式的3250个泰米尔动词LEMMAS与其260个共轭形式。此外,使用一组27个标签来标记动词的形态学信息。整个系统被开发为三层基于Web的系统,以解决处理泰米尔等凝固语言时产生的问题,并确保其可扩展性。与其他现有的MAG不同,Thamizhifst还提供对应于每个形态学标签的语素,并标记语素边界。评估表明,对于简单的动词,泰美智慧具有0.97的F法。未来和当前工作包括延长系统的工作,以涵盖更多动词和名词,并使其普遍可用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号