首页> 外文会议>MICAI 2008: Advances in Artificial Intelligence >A Soundex-Based Approach for Spoken Document Retrieval
【24h】

A Soundex-Based Approach for Spoken Document Retrieval

机译:基于Soundex的语音文档检索方法

获取原文
获取原文并翻译 | 示例

摘要

Current storage and processing facilities have caused the emergence of many multimedia repositories and, consequently, they have also triggered the necessity of new approaches for information retrieval. In particular, spoken document retrieval is a very complex task since existing speech recognition systems tend to generate several transcription errors (such as word substitutions, insertions and deletions). In order to deal with these errors, this paper proposes an enriched document representation based on a phonetic codification of the automatic transcriptions. This representation aims to reduce the impact of the transcription errors by representing words with similar pronunciations through the same phonetic code. Experimental results on the CL-SR corpus from the CLEF 2007 (which includes 33 test topics and 8,104 English interviews) are encouraging; our method achieved a mean average precision of 0.0795, outperforming all except one of the evaluated systems at this forum.
机译:当前的存储和处理设施已导致出现了许多多媒体存储库,因此,它们也触发了信息检索新方法的必要性。特别地,语音文档检索是一项非常复杂的任务,因为现有的语音识别系统往往会产生一些转录错误(例如单词替换,插入和删除)。为了解决这些错误,本文提出了一种基于自动转录的语音编码的丰富文档表示形式。这种表示的目的是通过通过相同的语音代码表示具有相似发音的单词,从而减少转录错误的影响。来自CLEF 2007的CL-SR语料库的实验结果(包括33个测试主题和8,104个英语访谈)令人鼓舞;我们的方法的平均平均精度为0.0795,优于本论坛上除评估系统之外的所有系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号