首页> 外文期刊>電子情報通信学会技術研究報告. 音声. Speech >News spoken document retrieval by considering out-of-vocabulary keywords
【24h】

News spoken document retrieval by considering out-of-vocabulary keywords

机译:通过考虑词汇外关键字来检索新闻语音文档

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

This paper describes a Japanese spoken document retrieval system that is robust for documents including Out-of-Vocabulary (OOV) words. One approach to spoken document retrieval is to automatically transcibe spoken documents into word sequences, which can be directory matched against queries. In this approach, however, a serious problem arises when both the queries and the documents include OOV keywords, where matching through OOV keywords always fails, because the OOV keyword can not be transcribed as a word. To avoid this problem, we use both word based indexing for in-vocabulary keywords and syllable based indexing for OOV keywords. Syllable based indexes are created from a transcript using a recognizer with OOV detection processing. Our approach of combining word based and syllable based indexing has advantages of both indexes. When a keyword included in a query is in-vocabulary words, this is matched directly against the word based index, where neither retrieval speed nor accuracy is harmed. When a keyword is OOV, a syllable form keyword is approximately matched against the syllable form index, or concatenated HMMs, which correspond to the syllable form keyword, are matched against spoken documents by a word spotting technique. Experiment results clearly claimed that the our approach is quite effective and robust in retrieving spoken documents by queries including OOV keywords.
机译:本文介绍了日语语音文档检索系统,该系统对于包括词汇量(OOV)单词在内的文档都非常强大。语音文档检索的一种方法是将语音文档自动转译为单词序列,可以将其与查询进行目录匹配。但是,在这种方法中,当查询和文档都包含OOV关键字时,会出现一个严重的问题,因为无法将OOV关键字转录为一个单词,因此通过OOV关键字进行的匹配始终会失败。为避免此问题,我们既对词汇中的关键字使用基于单词的索引,又对OOV关键字使用基于音节的索引。基于音节的索引是使用具有OOV检测处理的识别器从笔录中创建的。我们结合基于单词和基于音节的索引的方法具有两个索引的优点。当查询中包含的关键字是词汇中的单词时,它将直接与基于单词的索引进行匹配,从而不会损害检索速度或准确性。当关键字为OOV时,音节形式关键字与音节形式索引近似匹配,或者对应的音节形式关键字的串联HMM通过单词发现技术与语音文档匹配。实验结果清楚地表明,我们的方法在通过包括OOV关键字的查询中检索语音文档方面非常有效且健壮。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号