首页> 美国政府科技报告 >Improving the Capacity of Language Recognition Systems to Handle Rare Languages Using Radio Broadcast Data
【24h】

Improving the Capacity of Language Recognition Systems to Handle Rare Languages Using Radio Broadcast Data

机译:利用无线电广播数据提高语言识别系统处理稀有语言的能力

获取原文

摘要

The total duration of the project is divided into 2 phases The first phase planned for the period May 2008 to Oct 2008. The second phase planned for Nov 2008 to April 2008. It has the following 3 work-packages (WP). This project counts on Voice of America (VOA) data collection performed by LDC in the several past years. The VOA data will need to be completed with the available meta-information, especially about the language(s) contained. The following step will consist of cleaning the data and selecting relevant speech information, as we are aware of the automatically acquired data being quite dirty for the purposes of LRE: (1) automatic segmentation into speech, music and noise segments, while only speech will be retained. The speech/music segmentation was the topic of a diploma thesis finished at our department (Hovorka 2006) and is available for use in this project. (2) voice activity detection (VAD) that will be performed by our phoneme recognizer (Schwarz 2006) with all phoneme classes linked to 'speech' class. This setup was successfully used in a wide range of applications such as speaker recognition, language recognition, speech transcription and spoken term detection and evaluated in several NIST evaluations. (3) detecting telephone conversations in the data. In this project, we will mainly investigate the data that is as closed as possible to the target domain: conversational telephone speech (CTS). Therefore, we will concentrate on the segments with detected telephone speech (people calling in the broadcast) as we believe these should correspond the best to CTS. Initial work on Thai done for NIST LRE 2007 has shown a yield of 8 hours of telephone conversations from approximately 400 hours of VOA data downloaded from the Internet archive of VOA.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号