【24h】

Italian Lemmatization by Rules with Getaruns

机译:用吉拉森的规则意大利lemmat化

获取原文

摘要

We present an approach to lemmatization based on exhaustive morphological analysis and use of external knowledge sources to help disambiguation which is the most relevant issue to cope with. Our system GETARUNS was not concerned with lemmatization directly and used morphological analysis only as backoff solution in case the word was not retrieved in the wordform dictionaries available. We found out that both the rules and the root dictionary needed amending. This was started during development and before testset was distributed, but not completed for lack of time. Thus the task final results only depict an incomplete system, which has now eventually come to a complete version with rather different outcome. We moved from 98.42 to 99.82 in the testset and from 99.82 to 99.91 in the devset. As said above, this is produced by rules and is not subject to statistical evaluation which may change according to different training sets. In this version of the paper we perform additional experiments with WordForm dictionaries of Italian freely available online.
机译:我们提出了一种基于详尽的形态学分析和外部知识来源的lemmatization的方法,以帮助消除歧义,这是应对的最相关的问题。我们的系统GetarUns并不担心直接lemmatization并仅使用形态分析,仅作为退避解决方案,以便在可用的Wordform字典中未检索到。我们发现规则和根系都需要修改。这在开发期间开始,在测试集分发之前,但由于缺乏时间而未完成。因此,任务最终结果仅描绘了一个不完整的系统,现在已经最终实现了具有相当不同的结果的完整版本。我们在测试集中从98.42到99.82移动到59.82至99.91中的DEVSTET。如上所述,这是由规则制作的,不受统计评估的影响,可能根据不同的培训集改变。在此版本的论文中,我们使用意大利语在线免费提供额外的实验。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号