Segmentation of Monologues in Audio Books for Building Synthetic Voices

Prahallad K.; Black A.W.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Segmentation of Monologues in Audio Books for Building Synthetic Voices

【24h】

Segmentation of Monologues in Audio Books for Building Synthetic Voices

机译：在有声书中对独白进行分段以构建合成音色

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

One of the issues in using audio books for building a synthetic voice is the segmentation of large speech files. The use of the Viterbi algorithm to obtain phone boundaries on large audio files fails primarily because of huge memory requirements. Earlier works have attempted to resolve this problem by using large vocabulary speech recognition system employing restricted dictionary and language model. In this paper, we propose suitable modifications to the Viterbi algorithm and demonstrate its usefulness for segmentation of large speech files in audio books. The utterances obtained from large speech files in audio books are used to build synthetic voices. We show that synthetic voices built from audio books in the public domain have Mel-cepstral distortion scores in the range of 4-7, which is similar to voices built from studio quality recordings such as CMU ARCTIC.

机译：使用有声书来生成合成语音的问题之一是大型语音文件的分段。使用维特比算法来获取大型音频文件上的电话边界的失败主要是由于巨大的内存需求。早期的工作已尝试通过使用采用受限词典和语言模型的大型词汇语音识别系统来解决此问题。在本文中，我们提出了对Viterbi算法的适当修改，并证明了其对有声读物中的大型语音文件进行分割的有用性。从有声读物中的大型语音文件中获得的语音用于建立合成语音。我们显示，在公共领域中从有声书生成的合成语音的Mel倒谱失真分数在4-7范围内，这与从录音室质量的录音（如CMU ARCTIC）生成的语音相似。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2011年第5期|p.1444-1449|共6页
作者
Prahallad K.; Black A.W.;
展开▼
作者单位

Int. Inst. of Inf. Technol., Hyderabad, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Audio books; forced-alignment; large speech files; text-to-speech (TTS);

机译：有声读物;强制对齐;大语音文件;文本到语音（TTS）;

相似文献

外文文献
中文文献
专利

1. Tips on audio localization: synthetic vs. real voices [J] . Ben Warren MultiLingual . 2012,第4期

机译：音频本地化提示：合成语音与真实语音
2. Priority-Based Voice Segmentation and Transmission in Quality-Driven Wireless Audio Sensor Networks [J] . Turkes Okan, Baydere Sebnem Ad-hoc & sensor wireless networks . 2015,第1a4期

机译：质量驱动的无线音频传感器网络中基于优先级的语音分段和传输
3. Building personalised synthetic voices for individuals with severe speech impairment [J] . Sarah Creer, Stuart Cunningham, Phil Green, Computer speech and language . 2013,第6期

机译：为严重言语障碍的人建立个性化的合成声音
4. Building synthetic voices for under-resourced languages: The feasibility of using audiobook data [C] . Febe de Wet, Willem Van der Walt, Nkosikhona Dlamini, 2017 Pattern Recognition Association of South Africa and Robotics and Mechatronics . 2017

机译：为资源不足的语言构建合成语音：使用有声读物数据的可行性
5. Inner voices: Narrated monologue and narrative voice in Jane Austen, George Eliot, and Virginia Woolf [D] . Oberman, Rachel Provenzano 2007

机译：内在声音：Jane Austen，乔治·艾略特和弗吉尼亚伍尔夫的叙述独白和叙事声音
6. The role of emotion in dynamic audiovisual integration of faces and voices [O] . Jenny Kokinous, Sonja A. Kotz, Alessandro Tavano, 2015

机译：情感在面孔和声音的动态视听整合中的作用
7. DETECTING A TARGETED VOICE STYLE IN AN AUDIOBOOK USING VOICE QUALITY FEATURES [O] . Éva Székely, John Kane, Stefan Scherer, 2012

机译：使用语音质量功能检测aUDIOBOOK中的目标语音风格
8. Visually based Audio Texture Segmentation For Audio Scene Analysis. [R] . GHOZI, R., FRAJ, O. 2009

机译：用于音频场景分析的基于视觉的音频纹理分割。

Segmentation of Monologues in Audio Books for Building Synthetic Voices

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅