首页> 外文会议>6th workshop on ontologies and lexical resources. >Ukwabelana-An open-source morphological Zulu corpus
【24h】

Ukwabelana-An open-source morphological Zulu corpus

机译:共享-开源形态Zulu语料库

获取原文
获取原文并翻译 | 示例

摘要

Zulu is an indigenous language of South Africa, and one of the eleven official languages of that country. It is spoken by about 11 million speakers. Although it is similar in size to some Western languages, e.g. Swedish, it is considerably under-resourced. This paper presents a new open-source morphological corpus for Zulu named Ukwabelana corpus. We describe the agglutinating morphology of Zulu with its multiple prefixation and suffixation, and also introduce our labeling scheme. Further, the annotation process is described and all single resources are explained. These comprise a list of 10,000 labeled and 100,000 unlabeled word types, 3,000 part-of-speech (POS) tagged and 30,000 raw sentences as well as a morphological Zulu grammar, and a parsing algorithm which hypothesizes possible word roots and enumerates parses that conform to the Zulu grammar. We also provide a POS tagger which assigns the grammatical category to a morphologically analyzed word type. As it is hoped that the corpus and all resources will be of benefit to any person doing research on Zulu or on computer-aided analysis of languages, they will be made available in the public domain from
机译:祖鲁语是南非的一种土著语言,并且是该国的11种正式语言之一。约有1100万发言者讲话。尽管它的大小类似于某些西方语言,例如瑞典语,资源贫乏。本文为祖鲁语提出了一种新的开源形态语料库,名为Ukwabelana语料库。我们用祖鲁语的多个前缀和后缀描述祖鲁语的凝集形态,并介绍我们的标记方案。此外,描述了注释过程并解释了所有单个资源。其中包括10,000个标记和100,000个未标记词类型,3,000个词性(POS)标记和30,000个原始句子以及形态学Zulu语法的列表,以及一种解析算法,该算法假设可能的词根并枚举符合条件的解析祖鲁语语法。我们还提供POS标记器,可将语法类别分配给经过形态分析的单词类型。希望语料库和所有资源将对任何从事Zulu或计算机辅助语言分析的人员有所帮助,因此它们将在公共领域从

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号