首页> 外文会议>IEEE East-West Design and Test Symposium >Building Test Speech Dataset on Russian Language for Spoken Document Retrieval Task
【24h】

Building Test Speech Dataset on Russian Language for Spoken Document Retrieval Task

机译:以俄语为语音文档检索任务构建测试语音数据集

获取原文

摘要

The article presents a technique of creation of speech dataset which is applied for test of spoken document retrieval methods. The dataset includes radio news audio files with speech on Russian language, textual files with spoken words, textual files with recognition words from CMU Pocketsphinx and a set of queries with indication of relevant documents. Query words from the set is labeled with types of recognition errors which are determined word replacement, word distortion, word split and word deletion. The dataset contains expert's indication of documents which are relevant to queries.
机译:本文介绍了一种语音数据集创建技术,该技术可用于测试语音文档检索方法。数据集包括带有俄语语音的广播新闻音频文件,带有口语的文本文件,来自CMU Pocketsphinx的带有识别词的文本文件以及一组带有相关文档指示的查询。来自该集合的查询词被标记为识别错误的类型,这些错误由词替换,词失真,词拆分和词删除确定。数据集包含与查询相关的专家对文档的指示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号