首页> 外文会议>2nd workshop on the use of computational methods in the study of endangered languages 2017 >Instant annotations in ELAN corpora of spoken and written Komi, an endangered language of the Barents Sea region
【24h】

Instant annotations in ELAN corpora of spoken and written Komi, an endangered language of the Barents Sea region

机译:ELAN口语和书面语库米语的即时注释,这是巴伦支海地区的一种濒危语言

获取原文
获取原文并翻译 | 示例

摘要

The paper describes work-in-progress by the Izhva Komi language documentation project, which records new spoken language data, digitizes available recordings and annotate these multimedia data in order to provide a comprehensive language corpus as a databases forfuture research on and for this endangered - and under-described - Uralic speech community. While working with a spoken variety and in the framework of documentary linguistics, we apply language technology methods and tools, which have been applied so far only to normalized written languages. Specifically, we describe a script providing interactivity between ELAN, a Graphical User Interface tool for annotating and presenting multimodal corpora, and different morphosyntactic analysis modules implemented as Finite State Transducers and Constraint Grammar for rule-based morphosyntactic tagging and disambiguation. Our aim is to challenge current manual approaches in the annotation of language documentation corpora.
机译:本文描述了Izhva Komi语言文档项目的进行中工作,该项目记录了新的口头语言数据,将可用的记录数字化并注释了这些多媒体数据,从而提供了一个全面的语言库作为数据库,以供对此进行濒危研究并为此进行研究-以及下文所述的-乌拉尔语言社区。在使用多种口头语言并在文献语言学的框架内工作时,我们使用语言技术方法和工具,这些方法和工具到目前为止仅适用于规范化的书面语言。具体来说,我们描述了一个脚本,该脚本提供了ELAN,用于注释和呈现多模式语料库的图形用户界面工具以及为基于规则的形态语法标记和消歧而实现为有限状态换能器和约束语法的不同形态语法分析模块之间的交互性。我们的目标是在语言文档集注释中挑战当前的手动方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号