...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Model-Based Unsupervised Spoken Term Detection with Spoken Queries
【24h】

Model-Based Unsupervised Spoken Term Detection with Spoken Queries

机译:具有语音查询的基于模型的无监督语音术语检测

获取原文
获取原文并翻译 | 示例
           

摘要

We present a set of model-based approaches for unsupervised spoken term detection (STD) with spoken queries that requires neither speech recognition nor annotated data. This work shows the possibilities in migrating from DTW-based to model-based approaches for unsupervised STD. The proposed approach consists of three components: self-organizing models, query matching, and query modeling. To construct the self-organizing models, repeated patterns are captured and modeled using acoustic segment models (ASMs). In the query matching phase, a document state matching (DSM) approach is proposed to represent documents as ASM sequences, which are matched to the query frames. In this way, not only do the ASMs better model the signal distributions and time trajectories of speech, but the much-smaller number of states than frames for the documents leads to a much lower computational load. A novel duration-constrained Viterbi (DC-Vite) algorithm is further proposed for the above matching process to handle the speaking rate distortion problem. In the query modeling phase, a pseudo likelihood ratio (PLR) approach is proposed in the pseudo relevance feedback (PRF) framework. A likelihood ratio evaluated with query/anti-query HMMs trained with pseudo relevant/irrelevant examples is used to verify the detected spoken term hypotheses. The proposed framework demonstrates the usefulness of ASMs for STD in zero-resource settings and the potential of an instantly responding STD system using ASM indexing. The best performance is achieved by integrating DTW-based approaches into the rescoring steps in the proposed framework. Experimental results show an absolute 14.2% of mean average precision improvement with 77% CPU time reduction compared with the segmental DTW approach on a Mandarin broadcast news corpus. Consistent improvements were found on TIMIT and MediaEval 2011 Spoken Web Search corpus.
机译:我们提出了一套基于模型的方法,可用于不需要语音识别或注释数据的语音查询的无监督口语检测(STD)。这项工作表明了将无监督性病从基于DTW的方法迁移到基于模型的方法的可能性。所提出的方法包括三个组成部分:自组织模型,查询匹配和查询建模。为了构建自组织模型,使用声学片段模型(ASM)捕获重复模型并进行建模。在查询匹配阶段,提出了一种文档状态匹配(DSM)方法,将文档表示为与查询帧匹配的ASM序列。这样,ASM不仅可以更好地为语音的信号分布和时间轨迹建模,而且状态数比文档的帧少得多,从而导致计算量低得多。针对上述匹配过程,针对语音速率失真问题,提出了一种新的持续时间受限的维特比算法。在查询建模阶段,在伪相关反馈(PRF)框架中提出了伪似然比(PLR)方法。使用通过伪相关/不相关示例训练的查询/反查询HMM评估的似然比可用于验证检测到的口语假设。所提出的框架展示了ASM在零资源设置中对STD的有用性,以及使用ASM索引即时响应STD系统的潜力。通过将基于DTW的方法集成到建议框架中的计票步骤中,可以实现最佳性能。实验结果表明,与普通话广播新闻语料库上的分段DTW方法相比,平均平均精度绝对提高了14.2%,CPU时间减少了77%。在TIMIT和MediaEval 2011口语Web搜索语料库中发现了一致的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号