首页> 外文OA文献 >Customization of IBM Intu’s Voice by Connecting Text-to-Speech Services and a Voice Conversion Network
【2h】

Customization of IBM Intu’s Voice by Connecting Text-to-Speech Services and a Voice Conversion Network

机译:通过连接文本到语音服务和语音转换网络来定制IBm Intu的语音

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

IBM has recently launched Project Intu, which extends the existing web-based cognitive service Watson with the Internet of Things to provide an intelligent personal assistant service. We propose a voice customization service that allows a user to directly customize the voice of Intu. The method for voice customization is based on IBM Watson’s text-to-speech service and voice conversion model. A user can train the voice conversion model by providing a minimum of approximately 100 speech samples in the preferred voice (target voice). The output voice of Intu (source voice) is then converted into the target voice. Furthermore, the user does not need to offer parallel data for the target voice since the transcriptions of the source speech and target speech are the same. We also suggest methods to maximize the efficiency of voice conversion and determine the proper amount of target speech based on several experiments. When we measured the elapsed time for each process, we observed that feature extraction accounts for 59.7% of voice conversion time, which implies that fixing inefficiencies in feature extraction should be prioritized. We used the mel-cepstral distortion between the target speech and reconstructed speech as an index for conversion accuracy and found that, when the number of target speech samples for training is less than 100, the general performance of the model degrades.
机译:IBM最近启动了Project Intu,该项目将现有的基于Web的认知服务Watson扩展到了物联网,以提供智能的个人助理服务。我们提出了语音自定义服务,该服务允许用户直接自定义Intu的语音。语音定制的方法基于IBM Watson的文本到语音服务和语音转换模型。用户可以通过在首选语音(目标语音)中提供至少约100个语音样本来训练语音转换模型。然后将Intu的输出语音(源语音)转换为目标语音。此外,由于源语音和目标语音的转录是相同的,因此用户不需要为目标语音提供并行数据。我们还建议了一些方法,可以根据几种实验来最大化语音转换效率并确定适当的目标语音量。当我们测量每个过程的经过时间时,我们发现特征提取占语音转换时间的59.7%,这意味着应该优先解决特征提取中的固定效率低下的问题。我们使用目标语音和重构语音之间的mel-倒谱失真作为转换精度的指标,发现,当用于训练的目标语音样本数量少于100个时,模型的总体性能会下降。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号