【24h】

Keynote Talk: Advancing Technological Equity in Speech and Language Processing

机译:主题演讲:推进语音处理中的技术股权

获取原文

摘要

Accelerating advances in AI and deep neural networks have powered the proliferation of speech and language technologies in applications such as virtual assistants, smart speakers, reading machines, etc. The technologies have performed impressively well, achieving human parity in speech recognition accuracies and speech synthesis naturalness. As these technologies continue to permeate our daily lives, they need to support diverse users and usage contexts with inputs that deviate from the mainstream. Examples include non-native speakers, code-switching, speech carrying myriad emotions and styles, and speakers with impairments and disorders. Under such contexts, existing technologies often suffer performance degradations and fail to fulfill the needs of the users. The crux of the problem lies in data scarcity and data sparsity, which are exacerbated by high data variability. This talk presents an overview of some of the approaches we have used to address the challenges of data shortage, positioned at various stages along the processing pipeline. They include: data augmentation based on speech signal perturbations, use of pre-trained representations, learning speech representation disentanglement, knowledge distillation architectures, meta-learned model re-initialization, as well as adversarially trained models. The effectiveness of these approaches are demonstrated through a variety of applications, including accented speech recognition, dysarthric speech recognition, code-switched speech synthesis, disordered speech reconstruction, one-shot voice conversion and exemplar-based emotive speech synthesis. These efforts strive to develop speech and language technologies that can gracefully adapt and accommodate a diversity of user needs and usage contexts, in order to achieve technological equity in our society.
机译:加速AI和深神经网络的进步使得虚拟助理,智能扬声器,阅读机等应用中的言语和语言技术的扩散。该技术令人印象深刻地表现良好,在语音识别准确性和语音合成自然中实现人类奇偶阶段。由于这些技术继续渗透我们的日常生活,因此他们需要支持不同的用户和使用情况与偏离主流的输入。示例包括非母语扬声器,代码切换,携带无数情绪和风格的语音,以及具有损伤和障碍的扬声器。在这种情况下,现有技术经常遭受性能下降,并且未能满足用户的需求。问题的症结在于数据稀缺和数据稀疏性,这些稀疏性通过高数据变异性加剧。这次谈判概述了一些我们用于解决数据短缺挑战的方法概述,沿着处理管道定位在各个阶段。它们包括:基于语音信号扰动的数据增强,使用预先训练的表示,学习语音表示解剖,知识蒸馏架构,元学习模型重新初始化,以及普遍培训的模型。这些方法的有效性通过各种应用来证明,包括重音的语音识别,疑似语音识别,代码切换语音合成,无序的语音重建,一次性语音转换和基于示例性的情感语音合成。这些努力努力开发可以优雅地适应和满足用户需求和使用情况的多样性的语音和语言技术,以实现我们社会的技术股权。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号