首页> 外文期刊>The Journal of the Acoustical Society of America >Speech variability: A cross-language study on acoustic variations of speaking versus untrained singing
【24h】

Speech variability: A cross-language study on acoustic variations of speaking versus untrained singing

机译:言语变异性:对声音变化的跨语言研究与未经训练的歌唱

获取原文
获取原文并翻译 | 示例
           

摘要

Speech production variability introduces significant challenges for existing speech technologies such as speaker identification (SID), speaker diarization, speech recognition, and language identification (ID). There has been limited research analyzing changes in acoustic characteristics for speech produced by untrained singing versus speaking. To better understand changes in speech production of the untrained singing voice, this study presents the first cross-language comparison between normal speaking and untrained karaoke singing of the same text content. Previous studies comparing professional singing versus speaking have shown deviations in both prosodic and spectral features. Some investigations also considered assigning the intrinsic activity of the singing. Motivated by these studies, a series of experiments to investigate both prosodic and spectral variations of untrained karaoke singers for three languages, American English, Hindi, and Farsi, are considered. A comprehensive comparison on common prosodic features, including phoneme duration, mean fundamental frequency (F0), and formant center frequencies of vowels was performed. Collective changes in the corresponding overall acoustic spaces based on the Kullback-Leibler distance using Gaussian probability distribution models trained on spectral features were analyzed. Finally, these models were used in a Gausian mixture model with universal background model SID evaluation to quantify speaker changes between speaking and singing when the audio text content is the same. The experiments showed that many acoustic characteristics of untrained singing are considerably different from speaking when the text content is the same. It is suggested that these results would help advance automatic speech production normalization/compensation to improve performance of speech processing applications (e.g., speaker ID, speech recognition, and language ID).
机译:语音生产变化引入了现有语音技术的重大挑战,例如扬声器识别(SID),扬声器日益增加,语音识别和语言识别(ID)。已经有限的研究分析了通过未经训练的歌曲与口语产生的语音的声学特征的变化。为了更好地了解语音制作的语音生产的变化,本研究介绍了正常口语和未受伤的卡拉OK与同一文本内容的唱歌之间的第一个交叉语言比较。以前的研究比较专业歌唱与口语的研究表明了韵律和光谱特征的偏差。一些调查还考虑分配歌唱的内在活动。考虑了这些研究的动机,考虑了调查未经训练的卡拉OK歌手的韵律和光谱变化的三种语言,美国英语,印地语和波斯语。进行了关于常见韵律特征的全面比较,包括音素持续时间,平均基本频率(F0)以及元音的常规中心频率。分析了使用高斯概率分布模型的基于Kullback-Leibler距离的相应整体声学空间的集体变化进行了分析。最后,这些模型用于Gausian混合模型,通过通用背景模型SID评估,在音频文本内容相同时量化说话和唱歌之间的说话者的变化。实验表明,当文本内容相同时,许多未训练歌曲的声学特征与口语相比不同。建议这些结果将有助于提前自动语音生产标准化/补偿,以提高语音处理应用程序的性能(例如,扬声器ID,语音识别和语言ID)。

著录项

  • 来源
  • 作者单位

    Univ Texas Dallas Ctr Robust Speech Syst CRSS Robust Speech Technol Lab RSTL Richardson TX 75080 USA;

    Univ Texas Dallas Ctr Robust Speech Syst CRSS Robust Speech Technol Lab RSTL Richardson TX 75080 USA;

    Univ Texas Dallas Ctr Robust Speech Syst CRSS Robust Speech Technol Lab RSTL Richardson TX 75080 USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 声学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号