【24h】

The Gulf of Guinea Creole Corpora

机译:几内亚湾克里奥尔语语料库

获取原文

摘要

We present the process of building linguistic corpora of the Portuguese-related Gulf of Guinea Creoles, a cluster of four historically related languages: Santome, Angolar, Principense and Fa d'Ambo. We faced the typical difficulties of languages lacking an official status, such as lack of standard spelling, language variation, lack of basic language instruments, and small data sets, which comprise data from the late 19th century to the present. In order to tackle these problems, the compiled written and transcribed spoken data collected during field work trips were adapted to a normalized spelling that was applied to the four languages. For the corpus compilation we followed corpus linguistics standards. We recorded meta data for each file and added morphosyntactic information based on a part-of-speech tag set that was designed to deal with the specificities of these languages. The corpora of three of the four Creoles are already available and searchable via an online web interface.
机译:我们介绍了建立与葡萄牙相关的几内亚克里奥尔湾语言语料库的过程,该语言语料库由四种与历史相关的语言组成:Santome,Angolar,Principense和Fa d'Ambo。我们面临着缺乏正式地位的语言的典型困难,例如缺乏标准的拼写,语言的变化,缺乏基本的语言工具以及小型数据集,这些数据集包括19世纪后期到现在的数据。为了解决这些问题,在实地考察中收集了经过汇编的书面和转录语音数据,以适应适用于这四种语言的标准化拼写。对于语料库的编译,我们遵循语料库语言学标准。我们记录了每个文件的元数据,并根据词性标签集添加了句法信息,该标签集旨在处理这些语言的特殊性。四个克里奥尔语中的三个克里奥语料库已经可以通过在线网络界面获得和搜索。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号