首页> 外文会议>Workshop on NLP for Similar Languages, Varieties and Dialects >Building a Corpus for the Zaza-Gorani Language Family
【24h】

Building a Corpus for the Zaza-Gorani Language Family

机译:为zaza-gorani语言家庭构建一个语料库

获取原文

摘要

Thanks to the growth of local communities and various news websites along with the increasing accessibility of the Web, some of the endangered and less-resourced languages have a chance to revive in the information era. Therefore, the Web is considered a huge resource that can be used to extract language corpora which enable researchers to carry out various studies in linguistics and language technology. The Zaza-Gorani language family is a linguistic subgroup of the Northwestern Iranian languages for which there is no significant corpus available. Motivated to create one, in this paper we present our endeavour to collect a corpus in Zazaki and Gorani languages containing over 1.6M and 194k word tokens, respectively. This corpus is publicly available.
机译:由于当地社区的增长以及各种新闻网站以及网络的不断增加,一些濒危和资源的语言有机会在信息时代复活。 因此,网络被认为是一种巨大的资源,可以用于提取语言语言,使研究人员能够在语言学和语言技术中进行各种研究。 Zaza-Gorani语言家族是伊朗西北部语言的语言亚组,没有任何重要的语料库。 有动力创建一个,在本文中,我们致力于在Zazaki和Gorani语言中收集一个包含超过1.6米和194K个单词令牌的语料库。 该语料库是公开的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号