Re-ranking of spoken term detections using CRF-based triphone detection models

机译：使用基于CRF的三音检测模型对口语检测进行重新排序

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Conventional spoken term detection (STD) techniques, which use a text-based matching approach based on automatic speech recognition (ASR) systems, are not robust for speech recognition errors. This paper proposes a conditional random fields (CRF)-based re-ranking approach, which recomputes detection scores produced by a phoneme-based dynamic time warping (DTW) STD approach. In the re-ranking approach, we tackle STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. They train recognition error patterns such as phoneme-to-phoneme confusions on the CRF framework. Therefore, the models can detect a triphone, which is one of triphones composing a query term, with detection probability. In the experimental evaluation on the Japanese OOV test collection, the CRF-based approach alone could not outperform the conventional DTW-based approach we have already proposed; however, it worked well in the re-ranking (second-pass) process for the detections from the DTW-based approach. The CRF-based re-ranking approach made a 2.4% improvement of F-measure in the STD performance.

机译：使用基于自动语音识别（ASR）系统的基于文本的匹配方法的常规口语检测（STD）技术对于语音识别错误不是很可靠。本文提出了一种基于条件随机场（CRF）的重排序方法，该方法重新计算了基于音素的动态时间规整（DTW）STD方法产生的检测分数。在重新排序方法中，我们将STD视为序列标记问题。我们使用基于多种基于音素的转录类型生成的特征的基于CRF的三音素检测模型。他们在CRF框架上训练识别错误模式，例如音素对音素的混淆。因此，模型可以以检测概率来检测作为构成查询项的三音素之一的三音素。在对日本OOV测试集的实验评估中，仅基于CRF的方法不能超过我们已经提出的传统基于DTW的方法。但是，它在基于DTW的方法的检测的重新排序（第二遍）过程中效果很好。基于CRF的重新排序方法使STD性能中的F度量提高了2.4％。

著录项

来源
《Asia-Pacific Signal and Information Processing Association Annual Summit and Conference》|2014年|1-4|共4页
会议地点
作者
Sawada Naoki; Natori Satoshi; Nishizaki Hiromitsu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
pattern matching; random processes; speech recognition; text analysis; CRF-based re-ranking approach; CRF-based triphone detection model; DTW-based approach; F-measure; Japanese OOV test collection; conditional random field; detection score recomputation; phoneme-based dynamic time warping; phoneme-based transcriptions; recognition error patterns; sequence labeling problem; spoken term detection; text-based matching approach; Feature extraction; Hidden Markov models; Indexes; Probability; Speech; Speech recognition; Training;

机译：模式匹配;随机过程;语音识别;文本分析;基于CRF的重新排序方法;基于CRF的三音检测模型;基于DTW的方法; F测度;日语OOV测验;条件随机场;检测分数重新计算;音素动态时间规整;基于音素的转录;识别错误模式;序列标签问题;口语检测;基于文本的匹配方法;特征提取;隐马尔可夫模型;索引;概率;语音;语音识别;训练;

相似文献

外文文献
中文文献
专利

1. Re-Ranking Approach of Spoken Term Detection Using Conditional Random Fields-Based Triphone Detection [J] . Naoki SAWADA, Hiromitsu NISHIZAKI IEICE transactions on information and systems . 2016,第10期

机译：基于条件随机场的三音素检测的语音术语检测的重新排序方法
2. Model-Based Unsupervised Spoken Term Detection with Spoken Queries [J] . Chan C.-A., Lee L.-S. Audio, Speech, and Language Processing, IEEE Transactions on . 2013,第7期

机译：具有语音查询的基于模型的无监督语音术语检测
3. Code-switched English Pronunciation Modeling for Swahili Spoken Term Detection [J] . Neil Kleynhans, William Hartman, Daniel van Niekerk, Procedia Computer Science . 2016,第1期

机译：用于斯瓦希里语口语检测的代码转换英语发音建模
4. Re-ranking of spoken term detections using CRF-based triphone detection models [C] . Sawada Naoki, Natori Satoshi, Nishizaki Hiromitsu Asia-Pacific Signal and Information Processing Association Annual Summit and Conference . 2014

机译：使用基于CRF的Triphone检测模型重新排序口语术语检测
5. Discriminative Articulatory Feature-based Pronunciation Models with Application to Spoken Term Detection [D] . Prabhavalkar, Rohit. 2013

机译：基于区分性发音特征的语音模型及其在口语检测中的应用
6. CRF-Based Model for Instrument Detection and Pose Estimation in Retinal Microsurgery [O] . Mohamed Alsheakhali, Abouzar Eslami, Hessam Roodaki, 2016

机译：基于CRF的视网膜显微手术器械检测和姿势估计模型
7. Unsupervised Spoken Term Detection with Spoken Queries by Multi-level Acoustic Patterns with Varying Model Granularity [O] . Chung, Cheng-Tao, Chan, Chun-an, Lee, Lin-shan 2015

机译：基于多级语音查询的无监督语音词检测具有不同模型粒度的声学模式
8. Code-switched English Pronunciation Modeling for Swahili Spoken Term Detection (Pub Version, Open Access). [R] . Kleynhans, N., Hartman, W., Van Niekerk, D., 2016

机译：用于斯瓦希里语口语检测的代码转换英语发音建模（酒吧版，开放存取）。

Re-ranking of spoken term detections using CRF-based triphone detection models

摘要

著录项

相似文献

相关主题

期刊订阅