Model-Based Unsupervised Spoken Term Detection with Spoken Queries

Chan C.-A.; Lee L.-S.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Model-Based Unsupervised Spoken Term Detection with Spoken Queries

【24h】

Model-Based Unsupervised Spoken Term Detection with Spoken Queries

机译：具有语音查询的基于模型的无监督语音术语检测

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We present a set of model-based approaches for unsupervised spoken term detection (STD) with spoken queries that requires neither speech recognition nor annotated data. This work shows the possibilities in migrating from DTW-based to model-based approaches for unsupervised STD. The proposed approach consists of three components: self-organizing models, query matching, and query modeling. To construct the self-organizing models, repeated patterns are captured and modeled using acoustic segment models (ASMs). In the query matching phase, a document state matching (DSM) approach is proposed to represent documents as ASM sequences, which are matched to the query frames. In this way, not only do the ASMs better model the signal distributions and time trajectories of speech, but the much-smaller number of states than frames for the documents leads to a much lower computational load. A novel duration-constrained Viterbi (DC-Vite) algorithm is further proposed for the above matching process to handle the speaking rate distortion problem. In the query modeling phase, a pseudo likelihood ratio (PLR) approach is proposed in the pseudo relevance feedback (PRF) framework. A likelihood ratio evaluated with query/anti-query HMMs trained with pseudo relevant/irrelevant examples is used to verify the detected spoken term hypotheses. The proposed framework demonstrates the usefulness of ASMs for STD in zero-resource settings and the potential of an instantly responding STD system using ASM indexing. The best performance is achieved by integrating DTW-based approaches into the rescoring steps in the proposed framework. Experimental results show an absolute 14.2% of mean average precision improvement with 77% CPU time reduction compared with the segmental DTW approach on a Mandarin broadcast news corpus. Consistent improvements were found on TIMIT and MediaEval 2011 Spoken Web Search corpus.

机译：我们提出了一套基于模型的方法，可用于不需要语音识别或注释数据的语音查询的无监督口语检测（STD）。这项工作表明了将无监督性病从基于DTW的方法迁移到基于模型的方法的可能性。所提出的方法包括三个组成部分：自组织模型，查询匹配和查询建模。为了构建自组织模型，使用声学片段模型（ASM）捕获重复模型并进行建模。在查询匹配阶段，提出了一种文档状态匹配（DSM）方法，将文档表示为与查询帧匹配的ASM序列。这样，ASM不仅可以更好地为语音的信号分布和时间轨迹建模，而且状态数比文档的帧少得多，从而导致计算量低得多。针对上述匹配过程，针对语音速率失真问题，提出了一种新的持续时间受限的维特比算法。在查询建模阶段，在伪相关反馈（PRF）框架中提出了伪似然比（PLR）方法。使用通过伪相关/不相关示例训练的查询/反查询HMM评估的似然比可用于验证检测到的口语假设。所提出的框架展示了ASM在零资源设置中对STD的有用性，以及使用ASM索引即时响应STD系统的潜力。通过将基于DTW的方法集成到建议框架中的计票步骤中，可以实现最佳性能。实验结果表明，与普通话广播新闻语料库上的分段DTW方法相比，平均平均精度绝对提高了14.2％，CPU时间减少了77％。在TIMIT和MediaEval 2011口语Web搜索语料库中发现了一致的改进。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2013年第7期|1330-1342|共13页
作者
Chan C.-A.; Lee L.-S.;
展开▼
作者单位

Graduate Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Acoustics; Data models; Hidden Markov models; Speech; Speech recognition; Trajectory; Viterbi algorithm; Acoustic segment model; dynamic time warping; unsupervised spoken term detection; zero-resource;

机译：声学;数据模型;隐藏的马尔可夫模型;言语;语音识别;弹道;维特比算法;声段模型;动态时间扭曲;无监督口语检测;零资源;

相似文献

外文文献
中文文献
专利

1. Unsupervised Iterative Deep Learning of Speech Features and Acoustic Tokens with Applications to Spoken Term Detection [J] . Cheng-Tao Chung, Cheng-Yu Tsai, Chia-Hsiang Liu, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2017,第10期

机译：语音特征和声学令牌的无监督迭代深度学习及其在口语检测中的应用
2. Multilingual query-by-example spoken term detection in Indian languages [J] . Abhimanyu Popli, Arun Kumar International journal of speech technology . 2019,第1期

机译：多语言示例查询印度语言中的口语术语检测
3. Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection [J] . Madhavi Maulik C., Patil Hemant A. Computer speech and language . 2019,第NOVa期

机译：使用高斯混合模型框架进行语音片段长度归一化，以示例查询口语术语
4. Toward unsupervised model-based spoken term detection with spoken queries without annotated data [C] . Chan Chun-an, Chung Cheng-Tao, Kuo Yu-Hsin, IEEE International Conference on Acoustics, Speech and Signal Processing . 2013

机译：借助无注释数据的语音查询，实现基于无监督模型的语音术语检测
5. Adaptation and Augmentation: Towards Better Rescoring Strategies for Automatic Speech Recognition and Spoken Term Detection [D] . Ma, Min. 2018

机译：适应和增强：寻求更好的自动语音识别和语音术语检测的评分策略
6. Near-term fetal response to maternal spoken voice [O] . Kristin M. Voegtline, Kathleen A. Costigan, Heather A. Pater, -1

机译：胎儿对母亲口语语音的近期反应
7. Unsupervised Spoken Term Detection with Spoken Queries by Multi-level Acoustic Patterns with Varying Model Granularity [O] . Chung, Cheng-Tao, Chan, Chun-an, Lee, Lin-shan 2015

机译：基于多级语音查询的无监督语音词检测具有不同模型粒度的声学模式

Model-Based Unsupervised Spoken Term Detection with Spoken Queries

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅