首页> 外文会议>International Conference on Language Resources and Evaluation >SLaeNDa: An Annotated Corpus of Narrative and Dialogue in Swedish Literary Fiction
【24h】

SLaeNDa: An Annotated Corpus of Narrative and Dialogue in Swedish Literary Fiction

机译:SLAENDA:瑞典文学小说中的叙事和对话的注释语料库

获取原文

摘要

We describe a new corpus, SLaNDa, the Swedish Literary corpus of Narrative and Dialogue. It contains Swedish literary fiction, which has been manually annotated for cited materials, with a focus on dialogue. The annotation covers excerpts from eight Swedish novels written between 1879-1940, a period of modernization of the Swedish language. SLaNDa contains annotations for all cited materials that are separate from the main narrative, like quotations and signs. The main focus is on dialogue, for which we annotate speech segments, speech tags, and speakers. In this paper we describe the annotation protocol and procedure and show that we can reach a high inter-annotator agreement. In total, SLaNDa contains annotations of 44 chapters with over 220K tokens. The annotation identified 4,733 instances of cited material and 1,143 named speaker-speech mappings. The corpus is useful for developing computational tools for different types of analysis of literary narrative and speech. We perform a small pilot study where we show how our annotation can help in analyzing language change in Swedish. We find that a number of common function words have their modern version appear earlier in speech than in narrative.
机译:我们描述了一个新的词组,Slanda,瑞典文学语料库的叙事和对话。它包含瑞典文学小说,该小说已被手动注释为引用的材料,重点是对话。注释涵盖了来自瑞典语现代化的八个瑞典小说的摘录。 Slanda包含与主要叙述中的所有引用材料的注释,如报价和标志。主要重点是对话,我们注释了语音段,语音标签和扬声器。在本文中,我们描述了注释协议和程序,并表明我们可以达到高度注释间协议。总共,Slanda包含44个章节的注释,超过220k令牌。注释确定了4,733个引用的材料实例,1,143名命名为扬声器语音映射。语料库对于开发不同类型的文学叙事和语音分析的计算工具是有用的。我们执行一项小型试点研究,我们展示了我们的注释如何有助于分析瑞典语的语言变化。我们发现,许多常见的功能单词与他们的现代版本出现在演讲中比在叙述中出现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号