首页> 外文会议>International Conference on Text, Speech and Dialogue >Intelligibility Assessment of the De-Identified Speech Obtained Using Phoneme Recognition and Speech Synthesis Systems
【24h】

Intelligibility Assessment of the De-Identified Speech Obtained Using Phoneme Recognition and Speech Synthesis Systems

机译:使用音素识别和语音合成系统获得的去识别语音的可智能性评估

获取原文

摘要

The paper presents and evaluates a speaker de-identification technique using speech recognition and two speech synthesis techniques. The phoneme recognition system is built using HMM-based acoustical models of context-dependent diphone speech units, and two different speech synthesis systems (diphone TD-PSOLA-based and HMM-based) are employed for re-synthesizing the recognized sequences of speech units. Since the acoustical models of the two speech synthesis systems are assumed to be completely independent of the input speaker's voice, the highest level of input speaker de-identification is ensured. The proposed de-identification system is considered to be language dependent, but is, however, vocabulary and speaker independent since it is based mainly on acoustical modelling of the selected diphone speech units. Due to the relatively simple computing methods, the whole de-identification procedure runs in real-time. The speech outputs are compared and assessed by testing the intelligibility of the re-synthesized speech from different points of view. The assessment results show interesting variabilities of the evaluators' transcriptions depending on the input speaker, the synthesis method applied and the evaluators capabilities. But in spite of the relatively high phoneme recognition error rate (approx. 19%), the re-synthesized speech is in many cases still fully intelligible.
机译:本文使用语音识别和两个语音合成技术来评估扬声器去识别技术。音素识别系统是使用基于HMM的声学模型构建的上下文依赖的DIPHONE语音单元,并采用两个不同的语音合成系统(基于DIPHONE TD-PSOLA和基于HMM的)来重新合成识别的语音单元序列。由于假设两个语音合成系统的声学模型完全独立于输入扬声器的语音,因此确保了输入扬声器去识别的最高级别。所提出的去识别系统被认为是依赖的语言,但是,词汇和扬声器独立,因为它主要基于所选的Diphone语音单元的声学建模。由于计算方法相对简单,整个去识别过程实时运行。通过测试来自不同观点的重新合成语音的可懂度来比较和评估语音输出。评估结果表明评估员转录的有趣可变性,这取决于输入扬声器,合成方法应用和评估员能力。但尽管音素识别错误率相对较高(约19%),重新合成的演讲是在许多情况下仍然完全可理解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号