...
首页> 外文期刊>IEEE transactions on audio, speech and language processing >Pseudo pitch synchronous analysis of speech with applications to speaker recognition
【24h】

Pseudo pitch synchronous analysis of speech with applications to speaker recognition

机译:语音的伪音高同步分析及其在说话人识别中的应用

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The fine spectral structure related to pitch information is conveyed in Mel cepstral features, with variations in pitch causing variations in the features. For speaker recognition systems, this phenomenon, known as "pitch mismatch" between training and testing, can increase error rates. Likewise, pitch-related variability may potentially increase error rates in speech recognition systems for languages such as English in which pitch does not carry phonetic information. In addition, for both speech recognition and speaker recognition systems, the parsing of the raw speech signal into frames is traditionally performed using a constant frame size and a constant frame offset, without aligning the frames to the natural pitch cycles. As a result the power spectral estimation that is done as part of the Mel cepstral computation may include artifacts. Pitch synchronous methods have addressed this problem in the past, at the expense of adding some complexity by using a variable frame size and/or offset. This paper introduces Pseudo Pitch Synchronous (PPS) signal processing procedures that attempt to align each individual frame to its natural cycle and avoid truncation of pitch cycles while still using constant frame size and frame offset, in an effort to address the above problems. Text independent speaker recognition experiments performed on NIST speaker recognition tasks demonstrate a performance improvement when the scores produced by systems using PPS are fused with traditional speaker recognition scores. In addition, a better distribution of errors across trials may be obtained for similar error rates, and some insight regarding of role of the fundamental frequency in speaker recognition is revealed. Speech recognition experiments run on the Aurora-2 noisy digits task also show improved robustness and better accuracy for extremely low signal-to-noise ratio (SNR) data.
机译:与音调信息有关的精细光谱结构在梅尔倒谱特征中传达,音调变化会导致特征变化。对于说话人识别系统,这种现象称为训练和测试之间的“音高不匹配”,会增加错误率。同样,与音调相关的可变性可能会潜在地增加针对语音(例如,音调不携带语音信息)的英语(例如英语)的语音识别系统中的错误率。另外,对于语音识别和说话者识别系统,传统上都使用恒定的帧大小和恒定的帧偏移将原始语音信号解析为帧,而无需将帧与自然音高周期对齐。结果,作为梅尔倒谱计算的一部分进行的功率谱估计可能包括伪影。音调同步方法过去已经解决了这个问题,但代价是通过使用可变的帧大小和/或偏移来增加一些复杂性。本文介绍了伪音高同步(PPS)信号处理过程,这些过程试图将每个单独的帧对齐到其自然周期,并避免音调周期被截断,同时仍使用恒定的帧大小和帧偏移,以解决上述问题。在NIST说话者识别任务上执行的独立于文本的说话者识别实验证明,当使用PPS的系统产生的得分与传统说话者识别得分相融合时,性能会得到改善。此外,对于类似的错误率,可以在整个试验中获得更好的错误分布,并且揭示了有关基本频率在说话者识别中的作用的一些见解。针对极低信噪比(SNR)数据的Aurora-2噪声位数任务运行的语音识别实验还显示出更高的鲁棒性和更好的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号