首页> 外文期刊>IEICE Transactions on Information and Systems >Context-Dependent Boundary Model for Refining Boundaries Segmentation of TTS Units
【24h】

Context-Dependent Boundary Model for Refining Boundaries Segmentation of TTS Units

机译:用于改进TTS单元边界分割的上下文相关边界模型

获取原文
获取原文并翻译 | 示例
           

摘要

For producing high quality synthesis, a concatenation-based Text-to-Speech (TTS) system usually requires a large number of seg-mental units to cover various acoustic-phonetic contexts. However, careful manual labeling and segmentation by human experts, which is still the most reliable way to prepare such units, is labor intensive. In this paper we adopt a two-step procedure to automate the labeling, segmentation and refinement process. In the first step, coarse segmentation of speech data is performed by aligning speech signals with the corresponding sequence of Hidden Markov Models (HMMs). Then in the second step, segment boundaries are refined with a proposed Context-Dependent Boundary Model (CDBM). Classification and Regression Tree (CART) is adopted to organize available data into a structured hierarchical tree, where acoustically similar boundaries are clustered together to train tied CDBM models for boundary refinement. Optimal CDBM parameters and training conditions are found through a series of experimental studies. Comparing with manual segmentation reference, segmentation accuracy (within a tolerance of 20 ms) is improved by the CDBMs from 78.1% (baseline) to 94.8% in Mandarin Chinese and from 81.4% to 92.7% in English, with about 1,000 manually segmented sentences used in training the models. To further reduce the amount of manual data for training CDBMs of a new speaker, we adapt a well-trained CDBM via efficient adaptation algorithms. With only 10-20 manually segmented sentences as adaptation data, the adapted CDBM achieves a segmentation accuracy of 90%.
机译:为了产生高质量的合成,基于连接的文本语音转换(TTS)系统通常需要大量的段语音单元来覆盖各种语音环境。但是,由人类专家进行仔细的手动标记和分割仍然是劳动密集型工作,这仍然是制备此类装置的最可靠方法。在本文中,我们采用两步过程来自动化标记,分割和细化过程。第一步,通过将语音信号与隐马尔可夫模型(HMM)的相应序列对齐,对语音数据进行粗略分割。然后,在第二步中,使用建议的上下文相关边界模型(CDBM)完善分段边界。采用分类和回归树(CART)将可用数据组织到结构化的层次树中,在声学上相似的边界被聚在一起,以训练捆绑的CDBM模型以进行边界细化。通过一系列实验研究可以找到最佳的CDBM参数和训练条件。与手动分割参考相比,CDBM将中文(中文)的CDBM从78.1%(基线)提高到94.8%,英语将82.8%提升到92.7%(英语),并使用了约1000个手动分段的句子在训练模型。为了进一步减少培训新演讲者CDBM的手册数据量,我们通过有效的自适应算法对训练有素的CDBM进行了调整。仅使用10-20个手动分段的句子作为自适应数据,自适应的CDBM即可实现90%的分段精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号