首页> 外文会议>International conference on text, speech and dialogue >Morphosyntactic Annotation of Historical Texts. The Making of the Baroque Corpus of Polish
【24h】

Morphosyntactic Annotation of Historical Texts. The Making of the Baroque Corpus of Polish

机译:历史文本的形态音符注释。波兰巴洛克语料库的制作

获取原文

摘要

In the paper, we present some technical issues concerning processing 17th & 18th century texts for the purpose of building a corpus of that period. We describe a chain of procedures leading from transliterated source texts to morphological annotation of text samples that was implemented for building the Baroque Corpus of Polish, a relatively large historical corpus of Polish texts from 17th & 18th c. The described procedure consists of: automatic transliteration from original spelling to modern one, morphological analysis (including the construction of an inflectional dataset for Baroque Polish) and a tool for manual morphosyntactic annotation. The toolchain is being used to create a small manually validated subcorpus, which will serve as training data for a stochastic tagger. Then a larger corpus will be annotated automatically and made available via the Połiqarp corpus search tool.
机译:在本文中,我们提出了一些有关处理17和18世纪文本的技术问题,目的是建立那个时期的语料库。我们描述了一系列过程,从音译源文本到文本样本的形态注释,这些过程被用来构建波兰的巴洛克语料库,这是一个相对较大的17世纪和18世纪波兰语文本的历史语料库。所描述的过程包括:从原始拼写到现代拼写的自动音译,形态分析(包括为巴洛克式波兰语构建一个拐弯数据集)和一个用于手动词法句法注释的工具。该工具链用于创建一个小型的手动验证的子库,该子库将用作随机标记器的训练数据。然后将自动注释较大的语料库,并通过Połiqarp语料库搜索工具提供该语料库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号