首页> 外文期刊>Speech Communication >Phonetic alignment: speech synthesis-based vs. Viterbi-based
【24h】

Phonetic alignment: speech synthesis-based vs. Viterbi-based

机译:语音对齐:基于语音合成与基于维特比

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper we compare two different methods for automatically phonetically labeling a continuous speech database, as usually required for designing a speech recognition or speech synthesis system. The first method is based on temporal alignment of speech on a synthetic speech pattern; the second method uses either a continuous density hidden Markov models (HMM) or a hybrid HMM/ANN (artificial neural network) system in forced alignment mode. Both systems have been evaluated on read utterances not part of the training set of the HMM systems, and compared to manual segmentation. This study outlines the advantages and drawbacks of both methods. The speech synthetic system has the great advantage that no training stage (hence no large labeled database) is needed, while HMM systems easily handle multiple phonetic transcriptions (phonetic lattice). We deduce a method for the automatic creation of large phonetically labeled speech databases, based on using the synthetic speech segmentation tool to bootstrap the training process of either a HMM or a hybrid HMM/ANN system. The importance of such segmentation tools is a key point for the development of improved multilingual speech synthesis and recognition systems.
机译:在本文中,我们比较了自动语音标注连续语音数据库的两种不同方法,这是设计语音识别或语音合成系统通常需要的。第一种方法基于语音在合成语音模式上的时间对齐;第二种方法是在强制对齐模式下使用连续密度隐藏马尔可夫模型(HMM)或混合HMM / ANN(人工神经网络)系统。两种系统均根据不属于HMM系统训练集的部分的阅读语音进行了评估,并与手动分段进行了比较。这项研究概述了这两种方法的优缺点。语音合成系统具有很大的优势,即不需要训练阶段(因此不需要大型的标记数据库),而HMM系统可以轻松处理多个语音转录(语音格)。我们基于使用合成语音分割工具来引导HMM或HMM / ANN混合系统的训练过程,推导了一种自动创建大型带有语音标记的语音数据库的方法。这种分割工具的重要性是开发改进的多语言语音合成和识别系统的关键。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号