【24h】

Word-Formation Network for Czech

机译:捷克文构词网

获取原文

摘要

In the present paper, we describe the development of the lexical network DeriNet, which captures core word-formation relations on the set of around 266 thousand Czech lexemes. The network is currently limited to derivational relations because derivation is the most frequent and most productive word-formation process in Czech. This limitation is reflected in the architecture of the network: each lexeme is allowed to be linked up with just a single base word; composition as well as combined processes (composition with derivation) are thus not included. After a brief summarization of theoretical descriptions of Czech derivation and the state of the art of NLP approaches to Czech derivation, we discuss the linguistic background of the network and introduce the formal structure of the network and the semi-automatic annotation procedure. The network was initialized with a set of lexemes whose existence was supported by corpus evidence. Derivational links were created using three sources of information: links delivered by a tool for morphological analysis, links based on an automatically discovered set of derivation rules, and on a grammar-based set of rules. Finally, we propose some research topics which could profit from the existence of such lexical network.
机译:在本文中,我们描述了词法网络DeriNet的发展,该网络捕获了大约26.6万个捷克语词素集上的核心单词形成关系。该网络当前仅限于派生关系,因为派生是捷克语中最常见,最有生产力的单词形成过程。这种局限性反映在网络的体系结构中:每个词素只允许与一个基本词链接起来;因此,不包括组成以及组合过程(带有导数的组成)。在简要总结了捷克语派生的理论描述和捷克语派生的NLP方法的现状之后,我们讨论了网络的语言背景,并介绍了网络的形式结构和半自动注释过程。网络由一组词素初始化,这些词素的存在得到语料库证据的支持。派生链接是使用三种信息源创建的:用于形态分析的工具提供的链接,基于自动发现的一组派生规则的链接以及基于语法的规则集。最后,我们提出一些可以从这种词汇网络的存在中受益的研究主题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号