首页> 外文会议>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems
【24h】

Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

机译:Proteno:文本归一化与有限的数据,用于语音系统文本的快速部署

获取原文

摘要

Developing Text Normalization (TN) systems for Text-to-Speech (TTS) on new languages is hard. We propose a novel architecture to facilitate it for multiple languages while using data less than 3% of the size of the data used by the state of the art results on English. We treat TN as a sequence classification problem and propose a granular tok-enization mechanism that enables the system to learn majority of the classes and their normalizations from the training data itself. This is further combined with minimal pre-coded linguistic knowledge for other classes. We publish the first results on TN for TTS in Spanish and Tamil and also demonstrate that the performance of the approach is comparable with the previous work done on English.
机译:开发用于新语言的文本语音(TTS)的文本归一化(TN)系统很难。 我们提出了一种新颖的架构,以方便多种语言,同时使用少于艺术状态的数据尺寸的数据占英语的数据的大小的3%。 我们将TN视为序列分类问题,提出了一种粒度的TOK统治机制,使系统能够从培训数据本身中学习大多数类别和他们的常规程度。 这与其他类的最小预编码语言知识相结合。 我们在西班牙语和泰米尔中发布TN的第一个结果,也表明该方法的性能与英语上以前的工作相当。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号