Automatic estimation of the first two subglottal resonances in children's speech with application to speaker normalization in limited-data conditions

机译：自动估计儿童语音中的前两个声门下共振，并将其应用于有限数据条件下的说话人归一化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes an automatic algorithm for estimating the first two subglottal resonances (SGRs)-Sg1 and Sg2-from continuous speech of children, and applies it to automatic speaker normalization in mismatched, limited-data conditions. The proposed algorithm is based on the observation that Sg1 and Sg2 form phonological vowel feature boundaries, and is motivated by our recent SGR estimation algorithm for adults. The algorithm is trained and evaluated, respectively, on 25 and 9 children, aged between 7 and 18 years. The average RMS errors incurred in estimating Sg1 and Sg2 are 55 and 144 Hz, respectively. By applying the proposed algorithm to a connected digits speech recognition task, it is shown that: 1) a linear frequency warping using Sg1 or Sg2 is comparable to or better than maximum likelihood-based vocal tract length normalization (ML-VTLN), 2) the performance of SGR-based frequency warping is less content dependent than that of ML-VTLN, and 3) SGR-based frequency warping can be integrated into ML-VTLN to yield a statistically-significant improvement in performance.

机译：本文提出了一种自动算法，用于从儿童的连续语音中估计前两个声门下共振（SGRs）-Sg1和Sg2-，并将其应用于不匹配的有限数据条件下的自动说话人归一化。所提出的算法是基于Sg1和Sg2形成语音元音特征边界的观察结果，并且受我们最近针对成年人的SGR估计算法的启发。该算法分别针对25至9名7至18岁的儿童进行了训练和评估。估计Sg1和Sg2所引起的平均RMS误差分别为55和144 Hz。通过将所提出的算法应用于连接数字语音识别任务，表明：1）使用Sg1或Sg2进行的线性频率弯曲与基于最大似然性的声道长度归一化（ML-VTLN）相当或更好，2）与ML-VTLN相比，基于SGR的频率扭曲的内容依赖性较小; 3）可以将基于SGR的频率扭曲集成到ML-VTLN中，以产生统计学上显着的性能提升。

著录项

来源
《Annual conference of the International Speech Communication Association》|2012年|1266-1269|共4页
会议地点
作者
Harish Arsikere; Gary K. F. Leung; Steven M. Lulich; Abeer Alwan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
subglottal resonances; children's speech; automatic estimation; limited data; speaker normalization;

机译：声门下共振;儿童演讲;自动估算;数据有限;说话人归一化;

相似文献

外文文献
中文文献
专利

1. Automatic detection of the second subglottal resonance and its application to speaker normalization [J] . Shizhen Wang, Steven M. Lulich, Abeer Alwan The Journal of the Acoustical Society of America . 2009,第6期

机译：第二声门下共振的自动检测及其在说话人归一化中的应用
2. Automatic estimation of the first subglottal resonance [J] . Arsikere Harish, Lulich SM, Alwan A The Journal of the Acoustical Society of America . 2011,第5期

机译：自动估计声门下第一共振
3. Difficulties in Automatic Speech Recognition of Dysarthric Speakers and Implications for Speech-Based Applications Used by the Elderly: A Literature Review [J] . Victoria Young MHSca Alex Mihailidis PhDa* Assistive Technology: The Official Journal of RESNA . 2010,第2期

机译：扬声器异常语音自动识别的困难及其对老年人使用基于语音的应用的启示：文献综述
4. Automatic estimation of the first two subglottal resonances in children's speech with application to speaker normalization in limited-data conditions [C] . Harish Arsikere, Gary K. E. Leung, Steven M. Lulich, INTERSPEECH 2012 . 2012

机译：在有限数据条件下，在儿童语音中自动估计儿童语音中的前两个蓄血液共振
5. Rapid Speaker Normalization and Adaptation with Applications to Automatic Evaluation of Children's Language Learning Skills. [D] . Wang, Shizhen. 2010

机译：快速的说话人归一化和适应，并应用于儿童语言学习技能的自动评估。
6. Subglottal resonances of adult male and female native speakers of American English [O] . Steven M. Lulich, John R. Morton, Harish Arsikere, -1

机译：美国英语成年母语的成年男性和女性的声门下共鸣
7. Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization [O] . 2008

机译：面向自动语音识别的智能声学前端：内置扬声器归一化

Automatic estimation of the first two subglottal resonances in children's speech with application to speaker normalization in limited-data conditions

摘要

著录项

相似文献

相关主题

期刊订阅