首页> 外文会议>International Conference on Text, Speech and Dialogue >Automatic Acquisition of a Slovak Lexicon from a Raw Corpus
【24h】

Automatic Acquisition of a Slovak Lexicon from a Raw Corpus

机译:自动获取从原始语料库中获取斯洛伐克词典

获取原文

摘要

This paper presents an automatic methodology we used in an experiment to acquire a morphological lexicon for the Slovak language, and the lexicon we obtained. This methodology extends and refines approaches which have proven efficient, e.g., for the acquisition of French verbs or Croatian and Russian nouns, adjectives and verbs. It only relies on a raw corpus and on a morphological description of the language. The underlying idea is to build all possible lemmas that can explain all words found in the corpus, according to the morphological description, and to rank these hypothetical lemmas according to their likelihood given the corpus. Of course, hand-validation and iteration of the whole process is needed to achieve a high-quality lexicon, but the human involvement required is orders of magnitude lower than the cost of the fully manual development of such a resource. Moreover, this technique can be easily applied to other languages with a rich morphology that lack large-coverage lexical resources.
机译:本文介绍了我们在实验中使用的自动方法,以获取斯洛伐克语语言的形态词典,以及我们获得的词典。该方法延伸并改进了经过验证的方法,例如,用于收购法国动词或克罗地亚语和俄语名词,形容词和动词。它只依赖于原始语料库和语言的形态学描述。根据形态学描述,潜在的想法是建立所有可能的lemmas,可以解释语料库中发现的所有词语,并根据赋予语料库的可能性对这些假设的lemmas进行排名。当然,需要整个过程的手工验证和迭代来实现高质量的词典,但需要的人类参与是数量级,低于这种资源的完全手动开发的成本。此外,这种技术可以很容易地应用于其他语言,其具有丰富的形态,缺乏大覆盖的词汇资源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号