首页> 外文会议>International conference on computational linguistics >Collecting Bilingual Audio in Remote Indigenous Communities
【24h】

Collecting Bilingual Audio in Remote Indigenous Communities

机译:在远程土着社区收集双语音频

获取原文

摘要

Most of the world's languages are under-resourced, and most under-resourced languages lack a writing system and literary tradition. As these languages fall out of use, we lose important sources of data that contribute to our understanding of human language. The first, urgent step is to collect and orally translate a large quantity of spoken language. This can be digitally archived and later transcribed, annotated, and subjected to the full range of speech and language processing tasks, at any time in future. We have been investigating a mobile application for recording and translating unwritten languages. We visited indigenous communities in Brazil and Nepal and taught people to use smartphones for recording spoken language and for orally interpreting it into the national language, and collected bilingual phrase-aligned speech recordings. In spite of several technical and social issues, we found that the technology enabled an effective workflow for speech data collection. Based on this experience, we argue that the use of special-purpose software on smartphones is an effective and scalable method for large-scale collection of bilingual audio, and ultimately bilingual text, for languages spoken in remote indigenous communities.
机译:世界上大多数人都遭到资源,大多数资源不足的语言缺乏写作系统和文学传统。随着这些语言的使用,我们失去了重要的数据来源,这有助于我们对人类的语言理解。首先,紧急步骤是收集和口服翻译大量口语。这可以在将来随时进行数字归档,然后通过稍后转录,注释,并进行全方位的语音和语言处理任务。我们一直在调查用于录制和翻译不成文语言的移动应用程序。我们访问了巴西和尼泊尔的土着社区,并教授人员使用智能手机进行录制口语,并将其口头解释为国家语言,并收集双语短语对齐的讲话记录。尽管有几种技术和社会问题,我们发现该技术使语音数据收集有效的工作流程。基于这种经验,我们认为,在智能手机上使用专用软件是一种有效和可扩展的方法,用于大规模集合的双语音频,最终双语文本,用于远程土着社区中所说的语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号