Multiple-Prosody Speech Databases and Their Effectiveness in High-Quality Speech Synthesis at Arbitrary Rates

Tsuyoshi Masuda; Tomoki Toda; Hiromichi Kawanami; Hiroshi Saruwatari; Kiyohiro Shikano

首页> 外文期刊>Electronics and Communications in Japan. Part 2, Electronics >Multiple-Prosody Speech Databases and Their Effectiveness in High-Quality Speech Synthesis at Arbitrary Rates

【24h】

Multiple-Prosody Speech Databases and Their Effectiveness in High-Quality Speech Synthesis at Arbitrary Rates

机译：多韵律语音数据库及其在任意速率下高质量语音合成中的有效性

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper discusses a method of high-quality speech synthesis in which the speech rate can be controlled in various ways. When the prosody is adjusted by the PSOLA method or by the synthesis-by-analysis method in the waveform segment connection process, the quality declines as the extent of modification increases. To deal with this problem, this paper proposes a method in which modification of the segment duration is reduced and quality degradation is alleviated by using a speech database for each speech rate. The proposed method has the following features. (1) Synthesized speech with the target speech rate is produced for each utterance, and is recorded. (2) Speech databases of the same text at different speech rates are constructed. In this study, speech databases at three different speech rates, fast, medium, and slow," were acquired. Speech at two different speech rates (fast and slow) was synthesized by using the acquired speech databases and by the conventional method (using a speech database at the standard speech rate). Listening experiments showed that the proposed method can synthesize higher-quality speech than the conventional method. When speech databases with different speech rates are combined, there is a danger that the speech quality may be degraded due to differences in voice quality among the databases. The effect of voice quality was investigated in a listening experiment, and was found to be within the tolerable range.

机译：本文讨论了一种高质量的语音合成方法，其中可以通过多种方式控制语速。当在波形段连接过程中通过PSOLA方法或通过分析合成方法来调整韵律时，质量会随着修改程度的增加而下降。为了解决这个问题，本文提出一种方法，其中通过使用每个语音速率的语音数据库来减少段持续时间的修改并减轻质量下降。所提出的方法具有以下特征。（1）针对每种发声产生具有目标语速的合成语音，并进行记录。（2）建立不同语音速率的相同文本的语音数据库。在这项研究中，获取了三种语言的语音数据库，分别是快，中，慢三种。通过使用获取的语音数据库和常规方法（使用语音数据库）。聆听实验表明，与传统方法相比，该方法可以合成更高质量的语音；将不同语音速率的语音数据库组合使用时，存在语音质量下降的危险。数据库之间语音质量的差异在一次收听实验中对语音质量的影响进行了调查，结果发现语音质量在可容忍的范围内。

著录项

来源
《Electronics and Communications in Japan. Part 2, Electronics》 |2005年第9期|p.38-47|共10页
作者
Tsuyoshi Masuda; Tomoki Toda; Hiromichi Kawanami; Hiroshi Saruwatari; Kiyohiro Shikano;
展开▼
作者单位

Information Technology Laboratory, Corporate Research & Development, Asahi Kasei Corporation, Atsugi, 243-0021 Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类一般性问题;
关键词
speech synthesis; STRAIGHT; prosody; speech rate; database;

机译：语音合成;笔直;韵律;语速;数据库;

相似文献

外文文献
中文文献
专利

1. A study on the speech synthesis method by using database with variety of speech-rate [J] . Tsuyoshi Masuda, Tomoki Toda, Hiromichi Kawanami, 電子情報通信学会技術研究報告. 音声. Speech . 2001,第603期

机译：利用语音速率不同的数据库进行语音合成的方法研究
2. A study on the speech synthesis method by using database with variety of speech-rate [J] . Tsuyoshi Masuda, Tomoki Toda, Hiromichi Kawanami, 電子情報通信学会技術研究報告. 音声. Speech . 2001,第603期

机译：使用各种语音速率使用数据库语音合成方法的研究
3. Outlier Detection and Removal for HMM-Based Speech Synthesis with an Insufficient Speech Database [J] . Doo Hwa HONG, June Sig SUNG, Kyung Hwan OH, IEICE transactions on information and systems . 2012,第9期

机译：语音数据库不足的基于HMM的语音合成的异常值检测和消除
4. Multimode variable bit rate speech coding: an efficient paradigm for high-quality low-rate representation of speech signal [C] . Das, A., DeJaco, . 1999

机译：多模可变比特率语音编码：语音信号高质量低速率表示的有效范例
5. High-quality enhanced waveform interpolative coding of speech at low bit-rate [D] . Gottesman, Oded 2000

机译：低比特率语音的高质量增强波形内插编码
6. Speech Perception for Adult Cochlear Implant Recipients in a Realistic Background Noise: Effectiveness of Preprocessing Strategies and External Options for Improving Speech Recognition in Noise [O] . René H. Gifford, Lawrence J. Revit -1

机译：成人耳蜗植入者在现实背景噪声中的言语感知：预处理策略和外部选择改善噪声语音识别的有效性
7. Implementation and verification of speech database for unit selection speech synthesis [O] . Krzysztof Szklanny, Sebastian Koszuta 2017

机译：单位选择语音合成语音数据库的实现与验证
8. Interfacing COTS Speech Recognition and Synthesis Software to a Lotus Notes Military Command and Control Database [R] . Carr, O. 2002

机译：将COTs语音识别和综合软件连接到Lotus Notes军事指挥和控制数据库

Multiple-Prosody Speech Databases and Their Effectiveness in High-Quality Speech Synthesis at Arbitrary Rates

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅