首页> 外文OA文献 >Speaker normalisation for large vocabulary multiparty conversational speech recognition

【2h】

Speaker normalisation for large vocabulary multiparty conversational speech recognition

机译：说话人归一化，用于大型词汇多方对话语音识别

页面导航

摘要
著录项
相似文献
相关主题

摘要

One of the main problems faced by automatic speech recognition is the variability ofudthe testing conditions. This is due both to the acoustic conditions (different transmissionudchannels, recording devices, noises etc.) and to the variability of speechudacross different speakers (i.e. due to different accents, coarticulation of phonemesudand different vocal tract characteristics). Vocal tract length normalisation (VTLN)udaims at normalising the acoustic signal, making it independent from the vocal tractudlength. This is done by a speaker specific warping of the frequency axis parameterisedudthrough a warping factor. In this thesis the application of VTLN to multipartyudconversational speech was investigated focusing on the meeting domain. Thisudis a challenging task showing a great variability of the speech acoustics both acrossuddifferent speakers and across time for a given speaker. VTL, the distance betweenudthe lips and the glottis, varies over time. We observed that the warping factors estimatedudusing Maximum Likelihood seem to be context dependent: appearing to beudinfluenced by the current conversational partner and being correlated with the behaviourudof formant positions and the pitch. This is because VTL also influences theudfrequency of vibration of the vocal cords and thus the pitch. In this thesis we alsoudinvestigated pitch-adaptive acoustic features with the goal of further improving theudspeaker normalisation provided by VTLN.udWe explored the use of acoustic features obtained using a pitch-adaptive analysisudin combination with conventional features such as Mel frequency cepstral coefficients.udThese spectral representations were combined both at the acoustic featureudlevel using heteroscedastic linear discriminant analysis (HLDA), and at the systemudlevel using ROVER. We evaluated this approach on a challenging large vocabularyudspeech recognition task: multiparty meeting transcription. We found that VTLNudbenefits the most from pitch-adaptive features. Our experiments also suggested thatudcombining conventional and pitch-adaptive acoustic features using HLDA results inuda consistent, significant decrease in the word error rate across all the tasks. Combiningudat the system level using ROVER resulted in a further significant improvement.udFurther experiments compared the use of pitch adaptive spectral representation withudthe adoption of a smoothed spectrogram for the extraction of cepstral coefficients.udIt was found that pitch adaptive spectral analysis, providing a representation whichudis less affected by pitch artefacts (especially for high pitched speakers), delivers features with an improved speaker independence. Furthermore this has also shown toudbe advantageous when HLDA is applied. The combination of a pitch adaptive spectraludrepresentation and VTLN based speaker normalisation in the context of LVCSRudfor multiparty conversational speech led to more speaker independent acoustic modelsudimproving the overall recognition performances.

机译：自动语音识别面临的主要问题之一是测试条件的可变性。这既是由于声学条件（不同的传输 udchannel，记录设备，噪声等），也由于语音的可变性不同说话者之间的差异（即由于不同的口音，音素的共发音 ud和不同的声道特性）。声道长度归一化（VTLN） udaim旨在对声音信号进行归一化，使其独立于声道 udlength。这是通过扬声器的特定翘曲来完成的，该特定的翘曲是通过翘曲系数进行参数化/设置的。本文以会议领域为研究对象，探讨了VTLN在多方对话语音中的应用。这是一项具有挑战性的任务，对于给定的发言人而言，跨不同的说话者以及跨时间的语音声学变化都很大。 VTL（嘴唇和声门之间的距离）随时间变化。我们观察到，估计的翘曲最大因数的翘曲因子似乎与上下文有关：似乎受到当前会话伙伴的 ud影响，并且与共振峰位置和音调的行为 ud相关。这是因为VTL还影响声带振动的频率，从而影响音高。在本文中，我们还 ud研究了音高自适应声学特征，以进一步改善VTLN提供的 udppeaker归一化。 ud我们探索了将音高自适应分析 udin与常规特征（例如Mel）结合使用所获得的声学特征的用途这些频谱表示在声学特征 udlevel上使用异方差线性判别分析（HLDA）进行了组合，在系统 udlevel上使用ROVER进行了组合。我们在具有挑战性的大词汇 udspeech识别任务上评估了这种方法：多方会议抄录。我们发现，VTLN ud受益于音高自适应功能。我们的实验还表明，， / / / / / / / ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////·/////////使用ROVER将系统级别组合起来可以带来进一步的显着改善。 ud进一步的实验比较了使用音调自适应频谱表示和采用平滑频谱图提取倒频谱系数的情况。提供的音调假象的影响较小（尤其是高音调扬声器），它提供的功能具有增强的扬声器独立性。此外，当应用HLDA时，这也显示为有利。在多方对话语音的LVCSR ud的背景下，音调自适应频谱 udrepresentation和基于VTLN的说话人归一化的组合导致了更多的说话人独立声学模型 udim改善了整体识别性能。

著录项

作者
Garau Giulia;
展开▼
作者单位

展开▼
年度 2009
总页数
原文格式 PDF
正文语种 {"code":"en","name":"English","id":9}
中图分类

相似文献

外文文献
中文文献
专利

1. Modeling word-level rate-of-speech variation in large vocabulary conversational speech recognition [J] . Jing Zheng, Horacio Franco, Andreas Stolcke Speech Communication . 2003,第2a3期

机译：大型词汇会话语音识别中的词级语音变化率建模
2. Aging speech recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech [J] . Biswajit Das, Sandipan Mandal, Pabitra Mitra, Pattern recognition letters . 2013,第3期

机译：说话人适应技术对语音的老化识别：中词汇连续孟加拉语语音研究
3. Automatic Speech Recognition With Very Large Conversational Finnish and Estonian Vocabularies [J] . Seppo Enarvi, Peter Smit, Sami Virpioja, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2017,第11期

机译：具有大量会话芬兰语和爱沙尼亚语词汇的自动语音识别
4. Experiments in speaker normalisation and adaptation for large vocabulary speech recognition [C] . Pye, D., Woodland, . 1997

机译：说话人归一化和自适应处理大词汇量语音的实验
5. Real-time speaker -independent large vocabulary continuous speech recognition. [D] . Li, Xiaolong. 2005

机译：实时独立于说话者的大词汇量连续语音识别。
6. Hybridizing Conversational and Clear Speech to Investigate the Source of Increased Intelligibility in Speakers With Parkinson’s Disease [O] . Kris Tjaden, Alexander Kain, Jennifer Lam -1

机译：通过对话和清晰语音的混合来研究帕金森氏病患者口语清晰度提高的原因
7. Hybrid language models for out of vocabulary word detection in large vocabulary conversational speech recognition [O] . Ali Yazgan, Murat Saraclar 2004

机译：用于大词汇量会话语音识别中词汇外单词检测的混合语言模型
8. Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004 [R] . Martin, A., Miller, D., Przybocki, M., 2004

机译：2004年NIsT演讲者认可评估的会话电话语音语料库集

Speaker normalisation for large vocabulary multiparty conversational speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅