【24h】

Data Augmentation for Transformer-based G2P

机译:基于变压器的G2P的数据增强

获取原文

摘要

The Transformer model has been shown to outperform other neural seq2seq models in several character-level tasks. It is unclear, however, if the Transformer would benefit as much as other seq2seq models from data augmentation strategies in the low-resource setting. In this paper we explore methods for data augmentation in the g2p task together with the Transformer model. Our results show that a relatively simple alignment-based approach of identifying consistent input-output subsequences in grapheme-phoneme data combined with a subsequent splicing together of such pieces to generate hallucinated data works well in the low-resource setting, often delivering substantial performance improvement over a standard Transformer model.
机译:事实证明,在几个字符级任务中,Transformer模型的性能优于其他神经seq2seq模型。但是,目前尚不清楚在低资源环境下,Transformer是否会从数据增强策略中获得与其他seq2seq模型一样多的收益。在本文中,我们将探讨g2p任务中的数据增强方法以及Transformer模型。我们的结果表明,一种基于对齐方式的相对简单的方法可以识别字素-音素数据中一致的输入-输出子序列,并随后将这些片段拼接在一起以生成幻觉的数据,在资源匮乏的情况下效果很好,通常可以显着提高性能在标准的Transformer模型上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号