首页> 外文OA文献 >Speech Corpus Generation from DVDs of Movies and TV Series
【2h】

Speech Corpus Generation from DVDs of Movies and TV Series

机译:影视DVD中的语音语料库生成

摘要

Speech corpus is a database of audio files containing spoken words/sentences and text transcriptions. In this work we present a data collection system for creating speech corpora from movies and TV series DVDs. Corpus generation from these DVDs is significantly lower- cost solution comparing to conventional way of obtaining a speech corpus. In addition, it also takes a shorter amount of time to collect the data and processes it into a corpus. In order to be able to perform this operation the Data Collection Toolkit is introduced. This toolkit is an application developed using C# .Net Framework 3.5 in Visual Studio 2008. Throughout the presented work, this toolkit is included to show how it can be utilized to simplify the process of creating a corpus.
机译:语音语料库是音频文件的数据库,其中包含口头单词/句子和文本转录。在这项工作中,我们介绍了一个数据收集系统,用于从电影和电视连续剧DVD中创建语音语料库。与获取语音语料库的常规方法相比,从这些DVD生成语料库是一种成本更低的解决方案。另外,它还需要更短的时间来收集数据并将其处理成语料库。为了能够执行此操作,引入了数据收集工具包。该工具包是使用Visual Studio 2008中的C#.Net Framework 3.5开发的应用程序。在所介绍的全部工作中,都包括该工具包,以说明如何利用它简化创建语料库的过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号