首页> 外文会议>International workshop for computational linguistics of uralic languages >Languages under the influence: Building a database of Uralic languages
【24h】

Languages under the influence: Building a database of Uralic languages

机译:影响以下语言:构建一个尿尿语言数据库

获取原文

摘要

For most of the Uralic languages, there is a lack of systematically collected, consequently transcribed and morphologically annotated text corpora. This paper sums up the steps, the preliminary results and the future directions of building a linguistic corpus of some Uralic languages, namely Tundra Nenets, Udmurt, Synya Khanty, and Surgut Khanty. The experiences of building a corpus containing both old and modern, and written and oral data samples are discussed. Principles concerning data collection strategies of languages with different level of vitality and endangerment are discussed. Methodologies and challenges of data processing, and the levels of linguistic annotation are also described in detail.
机译:对于大多数尿尿语言来说,缺乏系统地收集,因此转录和形态地注释的文本语料库。本文总结了一些尿潴留语言的语言语言,即苔原Nenets,Udmurt,Synya Khanty和Surgut Khanty的初步结果和未来方向的步骤,初步结果和未来的方向。讨论了构建包含旧和现代和书面和口语数据样本的语料库的经验。讨论了有关不同程度的活力和危害的语言的数据收集策略的原则。还详细描述了数据处理的方法和挑战,以及语言注释的水平。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号