首页> 外文会议>International Conference on speech and computer >Creating Expressive TTS Voices for Conversation Agent Applications
【24h】

Creating Expressive TTS Voices for Conversation Agent Applications

机译:为会话代理应用程序创建富有表现力的TTS语音

获取原文

摘要

Text-to-Speech has traditionally been viewed as a "black box" component, where standard "portfolio" voices are typically offered with a professional but "neutral" speaking style. For commercially important languages many different portfolio voices may be offered all with similar speaking styles. A customer wishing to use TTS will typically choose one of these voices. The only alternative is to opt for a "custom voice" solution. In this case, a customer pays for a TTS voice to be created using their preferred voice talent. Such an approach allows for some "tuning" of the scripts used to create the voice. Limited script elements may be added to provide better coverage of the customer's expected domain and "gilded phrases" can be included to ensure that specific phrase fragments are spoken perfectly. However, even with such an approach the recording style is strictly controlled and standard scripts are augmented rather than redesigned from scratch. The "black box" approach to TTS allows for systems to be produced which satisfy the needs of a large number of customers, even if this means that solutions may be limited in the persona they present. Recent advances in conversational agent applications have changed people's expectations of how a computer voice should sound and interact. Suddenly, it's much more important for the TTS system to present a persona which matches the goals of the application. Such systems demand a more flamboyant, upbeat and expressive voice. The "black box" approach is no longer sufficient; voices for high-end conversational agents are being explicitly "designed" to meet the needs of such applications. These voices are both expressive and light in tone, and a complete contrast to the more conservative voices available for traditional markets. This paper will describe how Nuance is addressing this new and challenging market.
机译:传统上,将文本语音转换视为“黑匣子”组件,通常以专业但“中性”的说话风格提供标准的“组合”声音。对于具有重要商业意义的语言,可以提供许多不同的组合语音,并且所有语音都具有相似的说话风格。希望使用TTS的客户通常会选择这些声音之一。唯一的选择是选择“自定义语音”解决方案。在这种情况下,客户支付使用其首选语音才能创建的TTS语音的费用。这种方法允许对用于创建声音的脚本进行一些“调整”。可以添加有限的脚本元素,以更好地覆盖客户的预期域,并且可以包含“镀金短语”,以确保特定的短语片段能完美说出。但是,即使采用这种方法,也严格控制了录制样式,并且增强了标准脚本,而不是从头开始进行重新设计。 TTS的“黑匣子”方法允许生产满足大量客户需求的系统,即使这意味着解决方案可能受到他们所呈现角色的限制。会话代理程序应用程序的最新进展已改变了人们对计算机声音应如何发声和交互的期望。突然之间,对于TTS系统而言,呈现与应用程序目标相匹配的角色显得尤为重要。这样的系统需要更华丽,乐观和富有表现力的声音。 “黑匣子”方法已不再足够;高端对话代理的声音正在明确地“设计”以满足此类应用程序的需求。这些声音既富有表现力又淡淡,与传统市场上较为保守的声音形成了鲜明的对比。本文将描述Nuance如何应对这个新的充满挑战的市场。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号