首页> 外文会议>Pacific Asia Conference on Language, Information and Computation >Extracting a Lexicon of Discourse Connectives in Czech from an Annotated Corpus
【24h】

Extracting a Lexicon of Discourse Connectives in Czech from an Annotated Corpus

机译:从带注释的语料库中提取捷克语中的语篇连接词词典

获取原文

摘要

We discuss a process of exploiting a large corpus manually annotated with discourse relations - the Prague Discourse Treebank 2.0 -to create a lexicon of Czech discourse connectives (CzeDLex). The data format and the data structure of the lexicon are based on a study of similar existing resources and are adapted for a uniform representation of both primary (such as in English because, therefore) and secondary connectives (e.g. for this reason, this is the reason why). The main principle adopted for nesting entries in the lexicon is a discourse-semantic type expressed by the given connective word, which enables us to deal with a broad formal variability of connectives. We present a technical solution based on the (XML-based) Prague Markup Language that allows for an efficient incorporation of the lexicon into the family of Prague treebanks -it can be directly opened and edited in the tree editor TrEd. processed from the command line in btred, interlinked with its source corpus and queried in the PML-Tree Query engine - and also for interconnecting CzeDLex with existing lexicons in other languages.
机译:我们讨论了利用带有话语关系的手动注释的大型语料库-布拉格话语树库2.0-创建捷克话语连接词词典(CzeDLex)的过程。词典的数据格式和数据结构是基于对类似现有资源的研究,并且适用于初级连接词(例如,因为使用英语,因此)和次级连接词(例如,由于这个原因,这是统一的)的统一表示形式。原因)。在词典中嵌套条目所采用的主要原理是由给定连接词表示的话语语义类型,这使我们能够处理广泛的连接词形式变异。我们提供基于(基于XML的)布拉格标记语言的技术解决方案,该技术解决方案可将词典有效地合并到布拉格树库家族中-可以在树编辑器TrEd中直接打开和编辑该词典。从btred的命令行中处理,与其源语料库互连并在PML-Tree查询引擎中查询-以及将CzeDLex与其他语言的现有词典互连。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号