首页> 外文会议>International workshop for computational linguistics of uralic languages >FiST - towards a Free Semantic Tagger of Modern Standard Finnish
【24h】

FiST - towards a Free Semantic Tagger of Modern Standard Finnish

机译:拳头 - 走向现代标准芬兰的免费语义标签

获取原文

摘要

This paper introduces a work in progress for implementing a free full text semantic tagger for Finnish, FiST. The tagger is based on a 46 226 lexeme semantic lexicon of Finnish that was published in 2016. The basis of the semantic lexicon was developed in the early 2000s in an EU funded project Benedict (Lofberg et al., 2005). Loefberg (2017) describes compilation of the lexicon and evaluates a proprietary version of the Finnish Semantic Tagger, the FST. The FST and its lexicon were developed using the English Semantic Tagger (The EST) of University of Lancaster as a model. This semantic tagger was developed at the University Centre for Corpus Research on Language (UCREL) at Lancaster University as part of the UCREL Semantic Analysis System (US AS) framework. The semantic lexicon of the US AS framework is based on the modified and enriched categories of the Longman Lexicon of Contemporary English (McArthur, 1981). We have implemented a basic working version of a new full text semantic tagger for Finnish based on freely available components. The implementation uses Omorfi and FinnPos for morphological analysis of Finnish words. After the morphological recognition phase words from the 46K semantic lexicon are matched against the morphologically unambiguous base forms. In our comprehensive tests the lexical tagging coverage of the current implementation is around 82-90% with different text types. The present version needs still some enhancements, at least processing of semantic ambiguity of words and analysis of compounds, and perhaps also treatment of multiword expressions. Also a semantically marked ground truth evaluation collection should be established for evaluation of the tagger.
机译:本文介绍了为芬兰语,拳头实施免费的全文语义标记器的工作。标签基于2016年出版的芬兰的46226 lexeme语义词典。语义词典的基础是在2000年代初开发的,在欧盟资助的项目本尼迪克(Lofberg等,2005)。 Lofberg(2017)介绍了词典的编译,并评估了FRIET的专有版本的FST。 FST及其词典是使用兰开斯特大学的英语语义标签(EST)作为模范开发的。这个语义标签是在兰开斯特大学语言(Ucrel)的大学语言研究中心开发的,作为Ucrel语义分析系统(美国AS)框架的一部分。美国作为框架的语义词典是基于当代英语朗文词典的修改和丰富的类别(Mcarthur,1981)。我们基于自由可用组件实现了新的全文语义标记器的基本工作版本。实施使用OmorFi和FinNPO进行芬兰词语的形态分析。在46K语义词典中的形态识别阶段单词与形态上明确的基础形式匹配之后。在我们的全面测试中,当前实现的词汇标记覆盖率约为82-90%,文本类型为约82-90%。目前的版本仍然需要一些增强,至少还需要处理语义模糊性的词语和化合物的分析,也许也可能处理多个表达式。还应建立一个语义标记的地面真理评估集合来评估标记器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号