首页> 外文会议>Annual Conference of the International Speech Communication Association >Preliminary Experiments on Unsupervised Word Discovery in Mboshi
【24h】

Preliminary Experiments on Unsupervised Word Discovery in Mboshi

机译:MBOSHI中无监督词发现的初步实验

获取原文

摘要

The necessity to document thousands of endangered languages encourages the collaboration between linguists and computer scientists in order to provide the documentary linguistics community with the support of automatic processing tools. The French-German ANR-DFG project Breaking the Unwritten Language Barrier (BULB) aims at developing such tools for three mostly unwritten African languages of the Bantu family. For one of them, Mboshi, a language originating from the "Cuvette" region of the Republic of Congo, we investigate unsupervised word discovery techniques from an unsegmented stream of phonemes. We compare different models and algorithms, both monolingual and bilingual, on a new corpus in Mboshi and French, and discuss various ways to represent the data with suitable granularity. An additional French-English corpus allows us to contrast the results obtained on Mboshi and to experiment with more data.
机译:记录成千上万濒危语言的必要性鼓励语言学家和计算机科学家之间的合作,以便为纪录片语言学界提供自动处理工具。 法国 - 德国ANR-DFG项目突破了不成文的语言障碍(灯泡)旨在制定三个大多数不成文的非洲语言的工具。 对于其中之一,MBOSHI,来自刚果共和国的“比色皿”区域的语言,我们调查了来自未分段的音素流的无监督的单词发现技术。 我们比较Mboshi和法语的新语料库中的不同模型和算法,都在新的语料库上,并讨论以合适的粒度代表数据的各种方式。 额外的法语 - 英语语料库允许我们对比MBOSHI获得的结果并进行更多数据进行实验。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号