Building transcribed speech corpora quickly and cheaply for many languages

机译：快速，廉价地建立多种语言的转录语音语料库

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a system for quickly and cheaply building transcribed speech corpora containing utterances from many speakers in a variety of acoustic conditions. The system consists of a client application running on an Android mobile device with an intermittent Internet connection to a server. The client application collects demographic information about the speaker, fetches textual prompts from the server for the speaker to read, records the speaker's voice, and uploads the audio and associated metadata to the server. The system has so far been used to collect over 3000 hours of transcribed audio in 17 languages around the world.

机译：我们提出了一种用于快速而廉价地构建转录语音语料库的系统，该语料库包含来自多种扬声器在各种声学条件下的讲话。该系统由运行在Android移动设备上的客户端应用程序组成，该客户端应用程序与服务器之间存在间歇性Internet连接。客户端应用程序收集有关演讲者的人口统计信息，从服务器获取文本提示以供演讲者阅读，记录演讲者的语音，以及将音频和相关的元数据上传到服务器。迄今为止，该系统已用于收集全球17种语言的3000多个小时的转录音频。

著录项

来源
《Annual conference of the International Speech Communication Association;INTERSPEECH 2010》|2011年|p.1914-1917|共4页
会议地点
作者
Thad Hughes; Kaisuke Nakajima; Linne Ha; Atul Vasu; Pedro Moreno; Mike LeBeau;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类通信;
关键词
speech corpora; speech recognition; internationalization;

机译：语料库语音识别;国际化;

相似文献

外文文献
中文文献
专利

1. Transcriber: Development and use of a tool for assisting speech corpora production [J] . Claude Barras, Edouard Geoffrois0, Zhibial Wu 20f Speech Communication . 2001,第1a2期

机译：抄写员：开发和使用辅助语音语料库制作的工具
2. Collecting and evaluating speech recognition corpora for 11 South African languages [J] . Jaco Badenhorst, Charl van Heerden, Marelie Davel, Language Resources and Evaluation . 2011,第3期

机译：收集和评估11种南非语言的语音识别语料库
3. Development of speech corpora for speaker recognition research and evaluation in Indian languages [J] . Hemant A. Patil, T.K. Basu International journal of speech technology . 2008,第1期

机译：语音语料库的开发，用于印度语中的说话人识别研究和评估
4. Building transcribed speech corpora quickly and cheaply for many languages [C] . Thad Hughes, Kaisuke Nakajima, Linne Ha, Annual conference of the International Speech Communication Association . 2010

机译：建立转录的演讲语料库，适应许多语言
5. Interaction, authenticity and spoken corpora: Building teaching materials for adult English language learners. [D] . Cunningham, Courtney. 2010

机译：互动性，真实性和语料库：为成人英语学习者制作教材。
6. Building Gold Standard Corpora for Medical Natural Language Processing Tasks [O] . Louise Deleger, Qi Li, Todd Lingren, 2012

机译：构建用于医学自然语言处理任务的金标准语料库
7. Transcribing southern min speech corpora with a web-based language learning system [O] . Jun Cai, Jacques Feldmar, Yves Laprie, 2008

机译：利用基于网络的语言学习系统转录南方闽语语料库

Building transcribed speech corpora quickly and cheaply for many languages

摘要

著录项

相似文献

相关主题

期刊订阅