Extracting a Lexicon of Discourse Connectives in Czech from an Annotated Corpus

机译：从带注释的语料库中提取捷克语中的语篇连接词词典

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We discuss a process of exploiting a large corpus manually annotated with discourse relations - the Prague Discourse Treebank 2.0 -to create a lexicon of Czech discourse connectives (CzeDLex). The data format and the data structure of the lexicon are based on a study of similar existing resources and are adapted for a uniform representation of both primary (such as in English because, therefore) and secondary connectives (e.g. for this reason, this is the reason why). The main principle adopted for nesting entries in the lexicon is a discourse-semantic type expressed by the given connective word, which enables us to deal with a broad formal variability of connectives. We present a technical solution based on the (XML-based) Prague Markup Language that allows for an efficient incorporation of the lexicon into the family of Prague treebanks -it can be directly opened and edited in the tree editor TrEd. processed from the command line in btred, interlinked with its source corpus and queried in the PML-Tree Query engine - and also for interconnecting CzeDLex with existing lexicons in other languages.

机译：我们讨论了利用带有话语关系的手动注释的大型语料库-布拉格话语树库2.0-创建捷克话语连接词词典（CzeDLex）的过程。词典的数据格式和数据结构是基于对类似现有资源的研究，并且适用于初级连接词（例如，因为使用英语，因此）和次级连接词（例如，由于这个原因，这是统一的）的统一表示形式。原因）。在词典中嵌套条目所采用的主要原理是由给定连接词表示的话语语义类型，这使我们能够处理广泛的连接词形式变异。我们提供基于（基于XML的）布拉格标记语言的技术解决方案，该技术解决方案可将词典有效地合并到布拉格树库家族中-可以在树编辑器TrEd中直接打开和编辑该词典。从btred的命令行中处理，与其源语料库互连并在PML-Tree查询引擎中查询-以及将CzeDLex与其他语言的现有词典互连。

著录项

来源
《Pacific Asia Conference on Language, Information and Computation》|2017年|232-240|共9页
会议地点
作者
Pavlina Synkova; Magdalena Rysova; Lucie Polakova; Jiri Mirovsky;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. The Chinese Discourse TreeBank: a Chinese corpus annotated with discourse relations [J] . Zhou Yuping, Xue Nianwen Language Resources and Evaluation . 2015,第2期

机译：汉语话语树库：带有语篇关系的中文语料库
2. Coupling an annotated corpus and a lexicon for state-of-the-art POS tagging [J] . Pascal Denis, Benoit Sagot Language Resources and Evaluation . 2012,第4期

机译：耦合带注释的语料库和词典，以实现最新的POS标记
3. TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style [J] . Deniz Zeyrek, Amalia Mendes, Yulia Grishina, Language Resources and Evaluation . 2020,第2期

机译：TED多语言话语银行（TED-MDB）：以PDTB风格注释的并行语料库
4. Extracting a Lexicon of Discourse Connectives in Czech from an Annotated Corpus [C] . Pavlina Synkova, Magdalena Rysova, Lucie Polakova, Pacific Asia Conference on Language, Information and Computation . 2018

机译：从注释的语料库中提取捷克语中的话语连接词典
5. Quantitative determinants of prefabs: a corpus-based, experimental study of multiword units in the lexicon. [D] . Beckner, Clayton. 2013

机译：预制件的定量决定因素：基于语料库的词典中多字单元的实验研究。
6. GNI Corpus Version 1.0: Annotated Full-Text Corpus of Genomics Informatics to Support Biomedical Information Extraction [O] . So-Yeon Oh, Ji-Hyeon Kim, Seo-Jin Kim, 2018

机译：GNI语料库版本1.0：带注释的基因组学和信息学全文语料库支持生物医学信息提取
7. CzeDLex – A Lexicon of Czech Discourse Connectives [O] . Mírovský Jiří, Synková Pavlína, Rysová Magdaléna, 2017

机译：CzeDLex - 捷克话语连接词汇

Extracting a Lexicon of Discourse Connectives in Czech from an Annotated Corpus

摘要

著录项

相似文献

相关主题

期刊订阅