首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Segmentation of Monologues in Audio Books for Building Synthetic Voices

Segmentation of Monologues in Audio Books for Building Synthetic Voices


获取原文并翻译 | 示例


One of the issues in using audio books for building a synthetic voice is the segmentation of large speech files. The use of the Viterbi algorithm to obtain phone boundaries on large audio files fails primarily because of huge memory requirements. Earlier works have attempted to resolve this problem by using large vocabulary speech recognition system employing restricted dictionary and language model. In this paper, we propose suitable modifications to the Viterbi algorithm and demonstrate its usefulness for segmentation of large speech files in audio books. The utterances obtained from large speech files in audio books are used to build synthetic voices. We show that synthetic voices built from audio books in the public domain have Mel-cepstral distortion scores in the range of 4-7, which is similar to voices built from studio quality recordings such as CMU ARCTIC.
机译:使用有声书来生成合成语音的问题之一是大型语音文件的分段。使用维特比算法来获取大型音频文件上的电话边界的失败主要是由于巨大的内存需求。早期的工作已尝试通过使用采用受限词典和语言模型的大型词汇语音识别系统来解决此问题。在本文中,我们提出了对Viterbi算法的适当修改,并证明了其对有声读物中的大型语音文件进行分割的有用性。从有声读物中的大型语音文件中获得的语​​音用于建立合成语音。我们显示,在公共领域中从有声书生成的合成语音的Mel倒谱失真分数在4-7范围内,这与从录音室质量的录音(如CMU ARCTIC)生成的语音相似。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号