【24h】

Developing the Old Tibetan Treebank

机译:开发古老的藏族树库

获取原文

摘要

This paper presents a full procedure for the development of a segmented, POS-tagged and chunk-parsed corpus of Old Tibetan. As an extremely low-resource language, Old Tibetan poses non-trivial problems in every step towards the development of a searchable treebank. We demonstrate, however, that a carefully developed, semi-supervised method of optimising and extending existing tools for Classical Tibetan, as well as creating specific ones for Old Tibetan, can address these issues. We thus also present the very first Tibetan Treebank in a variety of formats to facilitate research in the fields of NLP, historical linguistics and Tibetan Studies.
机译:本文提出了一个完整的程序,用于开发分段,带有POS标签和大块解析的旧藏语料库。作为极少资源的语言,古藏语在发展可搜索树库的每一步中都提出了不小的问题。但是,我们证明,精心开发,半监督的方法可以优化和扩展用于古典藏文的现有工具,以及为古老藏文的工具创建特定工具,可以解决这些问题。因此,我们还以多种形式展示了首个藏族树库,以促进自然语言处理,历史语言学和藏族研究领域的研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号