We discuss a process of exploiting a large corpus manually annotated with discourse relations - the Prague Discourse Treebank 2.0 -to create a lexicon of Czech discourse connectives (CzeDLex). The data format and the data structure of the lexicon are based on a study of similar existing resources and are adapted for a uniform representation of both primary (such as in English because, therefore) and secondary connectives (e.g. for this reason, this is the reason why). The main principle adopted for nesting entries in the lexicon is a discourse-semantic type expressed by the given connective word, which enables us to deal with a broad formal variability of connectives. We present a technical solution based on the (XML-based) Prague Markup Language that allows for an efficient incorporation of the lexicon into the family of Prague treebanks -it can be directly opened and edited in the tree editor TrEd. processed from the command line in btred, interlinked with its source corpus and queried in the PML-Tree Query engine - and also for interconnecting CzeDLex with existing lexicons in other languages.
展开▼