首页> 外文期刊>IEEE Transactions on Speech and Audio Proceessing >Vocabulary-Independent Indexing of Spontaneous Speech
【24h】

Vocabulary-Independent Indexing of Spontaneous Speech

机译:自发语音的词汇无关索引

获取原文
获取原文并翻译 | 示例
           

摘要

We present a system for vocabulary-independent indexing of spontaneous speech, i.e., neither do we know the vocabulary of a speech recording nor can we predict which query terms for which a user is going to search. The technique can be applied to information retrieval, information extraction, and data mining. Our specific target is search in recorded conversations in the office/information-worker scenario—teleconferences, meetings, presentations, and voice mails. The focus of this paper is on how to index phonetic lattices. We will show that an index should provide expected term frequencies (ETFs) of query terms. Since, at indexing time, it is unknown which phoneme sequences constitute valid query terms, we will introduce an approximation of ETFs of a query's phoneme sequence by$M$-gram phoneme language models, which are estimated on lattices and organized in an inverted index-like structure for fast access. We will discuss ranking, estimation, and integration of phoneme/word hybrid approaches. Compared with an unindexed baseline without approximation, our approximation leads only to a 3.4% relative loss of search accuracy on the Linguistic Data Consortium (LDC) voicemail task. We also propose a two-stage method for locating individual keyword occurences using the above method as a fast match. A 20-times speedup is achieved over unindexed search at under a 2-point accuracy loss. Last, we will briefly introduce a prototype applet based on the above techniques.
机译:我们提出了一种独立于词汇量的自发语音索引系统,即,我们既不知道语音记录的词汇量,也不能预测用户要搜索的查询词。该技术可以应用于信息检索,信息提取和数据挖掘。我们的特定目标是在办公室/信息工作者场景中的记录的对话中进行搜索-电话会议,会议,演示和语音邮件。本文的重点是如何索引语音格。我们将显示索引应提供查询词的预期词频(ETF)。由于在建立索引时,尚不清楚哪个音素序列构成有效的查询词,因此我们将通过$ M $ -gram音素语言模型引入查询音素序列的ETF近似值,该模型以格估计并以倒排索引的形式组织类似的结构,可快速访问。我们将讨论音素/单词混合方法的排名,估计和集成。与没有近似值的未索引基线相比,我们的近似值仅导致语言数据联盟(LDC)语音邮件任务的搜索准确性相对降低3.4%。我们还提出了一种使用上述方法作为快速匹配来定位各个关键字出现的两阶段方法。在2点精度损失下,无索引搜索的速度提高了20倍。最后,我们将简要介绍基于上述技术的原型applet。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号