Keynote Talk: Advancing Technological Equity in Speech and Language Processing

机译：主题演讲：推进语音处理中的技术股权

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Accelerating advances in AI and deep neural networks have powered the proliferation of speech and language technologies in applications such as virtual assistants, smart speakers, reading machines, etc. The technologies have performed impressively well, achieving human parity in speech recognition accuracies and speech synthesis naturalness. As these technologies continue to permeate our daily lives, they need to support diverse users and usage contexts with inputs that deviate from the mainstream. Examples include non-native speakers, code-switching, speech carrying myriad emotions and styles, and speakers with impairments and disorders. Under such contexts, existing technologies often suffer performance degradations and fail to fulfill the needs of the users. The crux of the problem lies in data scarcity and data sparsity, which are exacerbated by high data variability. This talk presents an overview of some of the approaches we have used to address the challenges of data shortage, positioned at various stages along the processing pipeline. They include: data augmentation based on speech signal perturbations, use of pre-trained representations, learning speech representation disentanglement, knowledge distillation architectures, meta-learned model re-initialization, as well as adversarially trained models. The effectiveness of these approaches are demonstrated through a variety of applications, including accented speech recognition, dysarthric speech recognition, code-switched speech synthesis, disordered speech reconstruction, one-shot voice conversion and exemplar-based emotive speech synthesis. These efforts strive to develop speech and language technologies that can gracefully adapt and accommodate a diversity of user needs and usage contexts, in order to achieve technological equity in our society.

机译：加速AI和深神经网络的进步使得虚拟助理，智能扬声器，阅读机等应用中的言语和语言技术的扩散。该技术令人印象深刻地表现良好，在语音识别准确性和语音合成自然中实现人类奇偶阶段。由于这些技术继续渗透我们的日常生活，因此他们需要支持不同的用户和使用情况与偏离主流的输入。示例包括非母语扬声器，代码切换，携带无数情绪和风格的语音，以及具有损伤和障碍的扬声器。在这种情况下，现有技术经常遭受性能下降，并且未能满足用户的需求。问题的症结在于数据稀缺和数据稀疏性，这些稀疏性通过高数据变异性加剧。这次谈判概述了一些我们用于解决数据短缺挑战的方法概述，沿着处理管道定位在各个阶段。它们包括：基于语音信号扰动的数据增强，使用预先训练的表示，学习语音表示解剖，知识蒸馏架构，元学习模型重新初始化，以及普遍培训的模型。这些方法的有效性通过各种应用来证明，包括重音的语音识别，疑似语音识别，代码切换语音合成，无序的语音重建，一次性语音转换和基于示例性的情感语音合成。这些努力努力开发可以优雅地适应和满足用户需求和使用情况的多样性的语音和语言技术，以实现我们社会的技术股权。

著录项

来源
《International Joint Conference on Natural Language Processing;Annual Meeting of the Association for Computational Linguistics 》|2021年|ⅹⅹⅹⅲ-ⅹⅹⅹⅳ|共2页
会议地点
作者
Helen Meng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. From speech and talkers to the social world: The neural processing of human spoken language [J] . Scott Sophie K. Science . 2019 ,第6461期

机译：从演讲者和谈话者到社交世界：人类口语的神经处理
2. Staying on the same wavelength: talking about talking in paediatric speech and language therapy sessions. [J] . Merrills D Clinical linguistics & phonetics . 2009 ,第1期

机译：保持相同的波长：谈论在儿科言语和语言治疗课程中的谈话。
3. Correlating natural language processing and automated speech analysis with clinician assessment to quantify speech-language changes in mild cognitive impairment and Alzheimer’s dementia [J] . Yeung Anthony, Iaboni Andrea, Rochon Elizabeth, Alzheimer s Research & Therapy . 2021 ,第1期

机译：与临床医生评估相关的自然语言处理和自动化语言分析，以量化轻度认知障碍和阿尔茨海默痴呆症的语言变化
4. Keynote Talk: Advancing Technological Equity in Speech and Language Processing [C] . Helen Meng Annual Meeting of the Association for Computational Linguistics;International Joint Conference on natural Language Processing . 2021

机译：主题演讲：推进语音处理中的技术股权
5. The Effects of Brain Injury and Talker Characteristics on Speech Processing in a Single-Talker Interference Task. [D] . Krause, Miriam Ottile. 2011

机译：在单人称干预任务中，脑损伤和说话者特征对语音处理的影响。
6. Language-independent talker-specificity in first-language and second-language speech production by bilingual talkers: L1 speaking rate predicts L2 speaking rate [O] . Ann R. Bradlow, Midam Kim, Michael Blasingame -1

机译：双语说话者在第一语言和第二语言语音产生中与语言无关的说话者特异性：L1说话率预测L2说话率
7. Language-independent talker-specificity in first-language and second-language speech production by bilingual talkers: L1 speaking rate predicts L2 speaking rate [O] . Ann R. Bradlow, Midam Kim, Michael Blasingame 2017

机译：独立于语言的谈话者特异性，双语讲话者的第一语言和第二语言演讲：L1发言率预测L2发言率
8. Talking to InterFIS: Adding Speech Input to a Natural Language Interface [R] . Everett, S. S. 1992

机译：与InterFIs交谈：将语音输入添加到自然语言界面

Keynote Talk: Advancing Technological Equity in Speech and Language Processing

摘要

著录项

相似文献

相关主题

期刊订阅