首页> 外文OA文献 >Image processing methods to segment speech spectrograms for word level recognition

【2h】

Image processing methods to segment speech spectrograms for word level recognition

机译：用于分割语音频谱图以进行字级识别的图像处理方法

页面导航

摘要
著录项
相似文献
相关主题

摘要

The ultimate goal of automatic speech recognition (ASR) research is to allow a computer to recognize speech in real-time, with full accuracy, independent of vocabulary size, noise, speaker characteristics or accent. Today, systems are trained to learn an individual speaker's voice and larger vocabularies statistically, but accuracy is not ideal. A small gap between actual speech and acoustic speech representation in the statistical mapping causes a failure to produce a match of the acoustic speech signals by Hidden Markov Model (HMM) methods and consequently leads to classification errors. Certainly, these errors in the low level recognition stage of ASR produce unavoidable errors at the higher levels. Therefore, it seems that ASR additional research ideas to be incorporated within current speech recognition systems. This study seeks new perspective on speech recognition. It incorporates a new approach for speech recognition, supporting it with wider previous research, validating it with a lexicon of 533 words and integrating it with a current speech recognition method to overcome the existing limitations. The study focusses on applying image processing to speech spectrogram images (SSI). We, thus develop a new writing system, which we call the Speech-Image Recogniser Code (SIR-CODE). The SIR-CODE refers to the transposition of the speech signal to an artificial domain (the SSI) that allows the classification of the speech signal into segments. The SIR-CODE allows the matching of all speech features (formants, power spectrum, duration, cues of articulation places, etc.) in one process. This was made possible by adding a Realization Layer (RL) on top of the traditional speech recognition layer (based on HMM) to check all sequential phones of a word in single step matching process. The study shows that the method gives better recognition results than HMMs alone, leading to accurate and reliable ASR in noisy environments. Therefore, the addition of the RL for SSI matching is a highly promising solution to compensate for the failure of HMMs in low level recognition. In addition, the same concept of employing SSIs can be used for whole sentences to reduce classification errors in HMM based high level recognition. The SIR-CODE bridges the gap between theory and practice of phoneme recognition by matching the SSI patterns at the word level. Thus, it can be adapted for dynamic time warping on the SIR-CODE segments, which can help to achieve ASR, based on SSI matching alone.

机译：自动语音识别（ASR）研究的最终目标是使计算机能够完全准确地实时识别语音，而不受词汇量，噪声，说话者特征或口音的影响。如今，已经对系统进行了培训，以统计地学习单个讲话者的声音和更大的词汇量，但是准确性并不理想。统计映射中的实际语音和声学语音表示之间的小间隙会导致无法通过隐马尔可夫模型（HMM）方法生成声学语音信号的匹配，因此会导致分类错误。当然，这些错误在ASR的低级别识别阶段会在较高级别上产生不可避免的错误。因此，似乎将ASR的其他研究思想纳入当前的语音识别系统中。这项研究寻求语音识别的新视角。它结合了一种新的语音识别方法，为以前的广泛研究提供了支持，并使用533个单词的词典对其进行了验证，并将其与当前的语音识别方法集成以克服现有的局限性。该研究集中于将图像处理应用于语音频谱图图像（SSI）。因此，我们开发了一种新的书写系统，我们将其称为语音图像识别器代码（SIR-CODE）。 SIR-CODE是指语音信号到人工域（SSI）的转换，该域允许将语音信号分类为段。 SIR-CODE允许在一个过程中匹配所有语音特征（共振峰，功率谱，持续时间，发音位置的提示等）。通过在传统语音识别层（基于HMM）之上添加实现层（RL），以单步匹配过程检查单词的所有顺序电话，就可以实现此功能。研究表明，与单独的HMM相比，该方法可提供更好的识别结果，从而在嘈杂的环境中产生准确可靠的ASR。因此，为SSI匹配添加RL是一个很有前途的解决方案，以补偿HMM在低级别识别中的失败。另外，采用SSI的相同概念可用于整个句子，以减少基于HMM的高级识别中的分类错误。通过在单词级别匹配SSI模式，SIR-CODE弥合了音素识别理论与实践之间的鸿沟。因此，它可以适用于SIR-CODE段上的动态时间规整，从而可以仅基于SSI匹配来帮助实现ASR。

著录项

作者
Al-Darkazali Mohammed;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种 en
中图分类

相似文献

外文文献
中文文献
专利

1. Investigation of the effect of spectrogram images and different texture analysis methods on speech emotion recognition [J] . Turgut Özseven Applied Acoustics . 2018,第DECa期

机译：频谱图图像和不同纹理分析方法对语音情感识别的影响研究
2. Auditory-Inspired Morphological Processing of Speech Spectrograms: Applications in Automatic Speech Recognition and Speech Enhancement [J] . Joyner Cadore, Francisco J. Valverde-Albacete, Ascensión Gallardo-Antolín, Cognitive Computation . 2013,第4期

机译：语音频谱图的听觉启发式形态处理：在自动语音识别和语音增强中的应用
3. Auditory-Inspired Morphological Processing of Speech Spectrograms: Applications in Automatic Speech Recognition and Speech Enhancement [J] . Joyner Cadore, Francisco J. Valverde-Albacete, Ascensión Gallardo-Antolín, Cognitive computation . 2013,第4期

机译：语音频谱图的听觉启发式形态处理：在自动语音识别和语音增强中的应用
4. Defining properties of speech spectrogram images to allow effective pre-processing prior to pattern recognition [C] . Aldarkazali Mohammed, Young Rupert, Chatwin Chris, Optical pattern recognition XXIV . 2013

机译：定义语音频谱图图像的属性，以便在模式识别之前进行有效的预处理
5. An image recognition system using two-way communication between high level semantic processing and low level image processing. [D] . Prehmus, Alan W. 1985

机译：一种在高级语义处理和低级图像处理之间使用双向通信的图像识别系统。
6. Development of a Two-Stage Procedure for the Automatic Recognition of Dysfluencies in the Speech of Children Who Stutter: II. ANN Recognition of Repetitions and Prolongations With Supplied Word Segment Markers [O] . Peter Howell, Stevie Sackin, Kazan Glenn -1

机译：自动识别口吃儿童言语中流离失所的两阶段程序的发展：II。具有提供的词段标记的ANN识别重复和延长
7. Auditory-inspired morphological processing of speech spectrograms: applications in automatic speech recognition and speech enhancement [O] . Cadore Joyner, Valverde-Albacete Francisco J., Gallardo-Antolín Ascensión, 2012

机译：听觉启发的语音频谱图形态处理：自动语音识别和语音增强中的应用
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Image processing methods to segment speech spectrograms for word level recognition

摘要

著录项

相似文献

相关主题

期刊订阅