首页> 外文会议>2013 International Conference on Oriental COCOSDA >A syllable-based framework for unit selection synthesis in 13 Indian languages
【24h】

A syllable-based framework for unit selection synthesis in 13 Indian languages

机译:基于音节的13种印度语言单元选择综合框架

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we discuss a consortium effort on building text to speech (TTS) systems for 13 Indian languages. There are about 1652 Indian languages. A unified framework is therefore attempted required for building TTSes for Indian languages. As Indian languages are syllable-timed, a syllable-based framework is developed. As quality of speech synthesis is of paramount interest, unit-selection synthesizers are built. Building TTS systems for low-resource languages requires that the data be carefully collected an annotated as the database has to be built from the scratch. Various criteria have to addressed while building the database, namely, speaker selection, pronunciation variation, optimal text selection, handling of out of vocabulary words and so on. The various characteristics of the voice that affect speech synthesis quality are first analysed. Next the design of the corpus of each of the Indian languages is tabulated. The collected data is labeled at the syllable level using a semiautomatic labeling tool. Text to speech synthesizers are built for all the 13 languages, namely, Hindi, Tamil, Marathi, Bengali, Malayalam, Telugu, Kannada, Gujarati, Rajasthani, Assamese, Manipuri, Odia and Bodo using the same common framework. The TTS systems are evaluated using degradation Mean Opinion Score (DMOS) and Word Error Rate (WER). An average DMOS score of ≈3.0 and an average WER of about 20 % is observed across all the languages.
机译:在本文中,我们讨论了一个财团为建立13种印度语言的文本到语音(TTS)系统所做的努力。大约有1652种印度语言。因此,试图为印度语言构建TTS需要一个统一的框架。由于印度语言是按音节计时的,因此开发了基于音节的框架。由于语音合成的质量至关重要,因此构建了单元选择合成器。为使用低资源语言构建TTS系统需要仔细收集数据并添加注释,因为必须从头开始构建数据库。在建立数据库时,必须解决各种标准,即说话人选择,发音变化,最佳文本选择,词汇量不足的处理等。首先分析影响语音合成质量的语音的各种特征。接下来,将每种印度语言的语料库设计表化。使用半自动标记工具在音节级别标记收集的数据。使用相同的通用框架为所有13种语言(即印地语,泰米尔语,马拉地语,孟加拉语,马拉雅拉姆语,泰卢固语,卡纳达语,古吉拉特语,拉贾斯坦语,阿萨姆语,曼尼普里语,奥迪亚语和博多语)构建文本到语音合成器。使用降级平均意见评分(DMOS)和字错误率(WER)评估TTS系统。在所有语言中,平均DMOS得分约为3.0,平均WER约为20%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号