首页> 外文会议>IEEE East-West Design amp;amp;amp; Test Symposium >Building Test Speech Dataset on Russian Language for Spoken Document Retrieval Task
【24h】

Building Test Speech Dataset on Russian Language for Spoken Document Retrieval Task

机译:在俄语中构建测试语音数据集以便文档检索任务

获取原文

摘要

The article presents a technique of creation of speech dataset which is applied for test of spoken document retrieval methods. The dataset includes radio news audio files with speech on Russian language, textual files with spoken words, textual files with recognition words from CMU Pocketsphinx and a set of queries with indication of relevant documents. Query words from the set is labeled with types of recognition errors which are determined word replacement, word distortion, word split and word deletion. The dataset contains expert's indication of documents which are relevant to queries.
机译:该物品介绍了一种创建语音数据集的技术,用于测试口头文档检索方法。该数据集包括俄语语言上的语音,带有语音的无线电新闻音频文件,具有来自CMU Pocketsphinx的识别词的文本文件,以及一组具有相关文档的查询。查询集合中的单词标有类型的识别错误,这些错误是确定的单词替换,单词失真,单词拆分和单词删除。 DataSet包含专家对与查询相关的文档的指示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号