首页> 外文会议>Human language technology >MULTILINGUAL SPEECH DATABASES AT LDC
【24h】

MULTILINGUAL SPEECH DATABASES AT LDC

机译:LDC的多语言语音数据库

获取原文
获取原文并翻译 | 示例

摘要

As multilingual products and technology grow in importance, the Linguistic Data Consortium (LDC) intends to provide the resources needed for research and development activities, especially in telephone-based, small-vocabulary recognition applications; language identification research; and large vocabulary continuous speech recognition research.rnThe POLYPHONE corpora, a multilingual "database of databases," are specifically designed to meet the needs of telephone application development and testing. Data sets from many of the world's commercially important languages will be available within the next few years.rnLanguage identification corpora will be large sets of spontaneous telephone speech in several languages with a wide variety of speakers, channels, and handsets. One corpus is now available, and current plans call for corpora of increasing size and complexity over the next few years.rnLarge vocabulary speech recognition requires transcribed speech, pronouncing dictionaries, and language models. To fill this need, LDC will use the unattended computer-controlled collection methods developed for SWITCHBOARD to create several similar corpora, each about one-tenth the size of SWITCHBOARD, in other languages. Text corpora sufficient to create useful language models will be collected and distributed as well. Finally, pronouncing dictionaries covering the vocabulary of both transcripts and texts will be produced and made available.
机译:随着多语言产品和技术的重要性日益提高,语言数据协会(LDC)打算提供研发活动所需的资源,尤其是在基于电话的小词汇识别应用中;语言识别研究; POLYPHONE语料库是一种多语言的“数据库数据库”,是专门为满足电话应用程序开发和测试的需求而设计的。未来几年内将提供来自世界上许多重要商业语言的数据集。rnlanguage身份识别语料库将是多种语言的大型自发电话语音集,并具有多种扬声器,频道和听筒。现在有一个语料库,当前的计划要求在未来几年内增加语料库的规模和复杂性。大词汇量语音识别需要转录语音,发音词典和语言模型。为了满足这一需求,LDC将使用为SWITCHBOARD开发的无人值守的计算机控制的收集方法,以其他语言创建几种相似的语料库,每种语料库的大小约为SWITCHBOARD的十分之一。还将收集和分发足以创建有用的语言模型的文本语料库。最后,将制作并提供涵盖抄本和文本词汇的发音词典。

著录项

  • 来源
    《Human language technology》|1994年|23-26|共4页
  • 会议地点 Plainsboro NJ(US)
  • 作者

    John J. Godfrey;

  • 作者单位

    Linguistic Data Consortium University of Pennsylvania Philadelphia, PA 19104;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 计算机软件;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号