Structured Redefinition of Sound Units by Merging and Splitting for Improved Speech Recognition

机译：通过合并和拆分对声音单元进行结构化重新定义，以改善语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The performance of speech recgnition systems degrades when the basic sound units used are poorly defined or inconsistently used. Several attempts have been made to improve dictionaries automatically, either by redefining pronunciations of words in terms of existing sound units, or by redefining the sound units themselves completely. The problem with these approaches is that, while the former is limited by the sound units used, the latter discards all human information that has been incorporated into an expert-designed recognition dictionary. In this paper we propose a new merging-and-splitting algorithm that attempts to redefine the basic sound units used in the dictionary, whiel maintaining the expert knowledge built into a manually designed dictionary. Sound units from an existing dictionary are merged based on their inherent confusability, as measured by a Monte-Carlo based metric, and subsequently split to maximize the likelihood of the training data. Experiemtns with the Resource Management database indicate that this approach results in an improvement in recognition accuracy when context-independent models are used for recognition. When context-dependent models are used for recognition. When context-dependent models are used, the improvement observeed is reduced.

机译：当使用的基本声音单元定义不当或不一致时，语音识别系统的性能会降低。已经进行了一些尝试来自动改进字典，这是通过根据现有声音单位重新定义单词的发音，或者通过完全重新定义声音单位本身来进行的。这些方法的问题在于，尽管前者受到使用的声音单位的限制，但后者却丢弃了已被整合到专家设计的识别词典中的所有人类信息。在本文中，我们提出了一种新的合并和拆分算法，该算法试图重新定义词典中使用的基本声音单位，同时保持内置在手动设计的词典中的专家知识。现有字典中的声音单元会根据其固有的易混淆性进行合并（如基于蒙特卡洛的度量标准进行度量），然后进行拆分以最大程度地提高训练数据的可能性。资源管理数据库的经验表明，当使用上下文无关模型进行识别时，此方法可提高识别准确性。当使用上下文相关模型进行识别时。当使用上下文相关模型时，观察到的改进会减少。

著录项

来源
《6th International conference on Spoken Language Processing ICSLP 2000 Oct. 16-Oct.20 2000 Beijing International Convention Center, Beijing, China》|2000年|p.151-154|共4页
会议地点
作者
Rita Singh; Bhiksha Raj; Richard M.Stern;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类世界各国文化与文化事业;
关键词

相似文献

外文文献
中文文献
专利

1. Contribution of linguistic memory to recognition of speech sounds with low intelligibility: investigation with noise-vocoded speech sound [J] . Megumi Saruta, Ryosuke Tachibana, Yasunari Sasaki, 電子情報通信学会技術研究報告. 音声. Speech . 2007,第614期

机译：语言记忆对语音清晰度低的语音识别的贡献：使用噪声编码语音进行调查
2. Contribution of linguistic memory to recognition of speech sounds with low intelligibility: investigation with noise-vocoded speech sound [J] . Megumi Saruta, Ryosuke Tachibana, Yasunari Sasaki, 電子情報通信学会技術研究報告. 音声. Speech . 2006,第614期

机译：语言记忆对语音清晰度低的语音识别的贡献：使用噪声编码语音进行调查
3. Contribution of linguistic memory to recognition of speech sounds with low intelligibility: investigation with noise-vocoded speech sound [J] . Megumi Saruta, Ryosuke Tachibana, Yasunari Sasaki, 電子情報通信学会技術研究報告. 音声. Speech . 2006,第614期

机译：语言记忆对低理性识别语言的贡献：噪声声音声音的调查
4. Structured Redefinition of Sound Units by Merging and Splitting for Improved Speech Recognition [C] . Rita Singh, Bhiksha Raj, Richard M.Stern International conference on spoken language processing . 2000

机译：通过合并和分裂来实现声音单位的结构重新定义，以改进语音识别
5. Improving Speech-Related Facial Action Unit Recognition by Audiovisual Information Fusion [D] . Meng, Zibo. 2018

机译：视听信息融合改善与语音相关的面部动作单元识别
6. Sensitivity to Structure in the Speech Signal by Children with Speech Sound Disorder and Reading Disability [O] . Erin Phinney Johnson, Bruce F. Pennington, Joanna H. Lowenstein, -1

机译：语音声音障碍和阅读残疾儿童的语音信号中结构的敏感性
7. Performance of Pseudomorpheme-Based Speech Recognition Units Obtained by Unsupervised Segmentation and Merging [O] . Jeong-Uk Bang, Oh-Wook Kwon 2014

机译：基于假映射的语音识别单元的性能通过无监督分割和合并获得
8. Prosodic Aids to Speech Recognition: VI. Timing Cues to Linguistic Structure and Improved Computer Programs for Prosodic Analysis. [R] . Lea, W. A., Kloker, D. R. 1975

机译：韵律语音识别助手：VI。韵律分析的计算机程序。

Structured Redefinition of Sound Units by Merging and Splitting for Improved Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅