...
首页> 外文期刊>Informatica: An International Journal of Computing and Informatics >*MWELex – MWE Lexica of Croatian, Slovene and Serbian Extracted from Parsed Corpora
【24h】

*MWELex – MWE Lexica of Croatian, Slovene and Serbian Extracted from Parsed Corpora

机译:* MWELex –克罗地亚,斯洛文尼亚和塞尔维亚人的MWE Lexica摘录自解析语料库

获取原文

摘要

The paper presents *MWELex, a multilingual lexical repository of Croatian, Slovene and Serbian multiword expressions that were extracted from parsed corpora. The lexica were built with the custom-built DepMWEx tool which uses dependency syntactic patterns to identify MWE candidates in parse trees. The extracted MWE candidates are subsequently scored by co-occurrence and organized by headwords producing a resource of 23 to 48 thousand headwords and 3.2 to 12 million MWE candidates per language. The evaluation of the lexicon, performed on Croatian and Slovene, shows an overall precision of just over 50% for Croatian but as high as 85% for Slovene. Similarly, precision over specific syntactic patterns varies greatly, 0.167-0.859 for Croatian, 0.158-1.00 for Slovene. The possible extension of the tool is demonstrated on a simplistic distributional-based extraction of non-transparent MWEs and cross-lingual linking of the extracted lexicons.
机译:本文介绍了* MWELex,这是从已解析的语料库中提取的克罗地亚语,斯洛文尼亚语和塞尔维亚语多词表达式的多语言词汇库。词典是使用自定义的DepMWEx工具构建的,该工具使用依赖句法模式在解析树中标识MWE候选对象。提取的MWE候选者随后通过同现进行评分,并由headwords组织,从而产生23至4.8万headwords的资源和3.2至1200万MWE候选者每种语言。对克罗地亚语和斯洛文尼亚语进行的词典评估显示,克罗地亚语的整体精度略高于50%,而斯洛文尼亚则高达85%。同样,特定语法模式的精度差异很大,克罗地亚语为0.167-0.859,斯洛文尼亚语为0.158-1.00。该工具的可能扩展在非透明MWE的基于简单分布的提取以及提取的词典的跨语言链接上得到了证明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号