【24h】

Speech Analysis in the Big Data Era

机译:大数据时代的语音分析

获取原文

摘要

In spoken language analysis tasks, one is often faced with comparably small available corpora of only one up to a few hours of speech material mostly annotated with a single phenomenon such as a particular speaker state at a time. In stark contrast to this, engines such as for the recognition of speakers' emotions, sentiment, personality, or pathologies, are often expected to run independent of the speaker, the spoken content, and the acoustic conditions. This lack of large and richly annotated material likely explains to a large degree the headroom left for improvement in accuracy by todays engines. Yet, in the big data era, and with the increasing availability of crowd-sourcing services, and recent advances in weakly supervised learning, new opportunities arise to ease this fact. In this light, this contribution first shows the de-facto standard in terms of data-availability in a broad range of speaker analysis tasks. It then introduces highly efficient 'cooperative' learning strategies basing on the combination of active and semi-supervised alongside transfer learning to best exploit available data in combination with data synthesis. Further, approaches to estimate meaningful confidence measures in this domain are suggested, as they form (part of) the basis of the weakly supervised learning algorithms. In addition, first successful approaches towards holistic speech analysis are presented using deep recurrent rich multi-target learning with partially missing label information. Finally, steps towards needed distribution of processing for big data handling are demonstrated.
机译:在口语分析任务中,人们经常会遇到相对较小的可用语料库,该语料库只有一个到几个小时的语音材料,并且大多数情况下都会用一种现象(例如一次特定的说话者状态)进行注释。与此形成鲜明对比的是,通常期望诸如识别说话者的情绪,情感,个性或病态之类的引擎独立于说话者,说话内容和听觉条件而运行。缺少大量且注释丰富的材料很可能在很大程度上解释了当今发动机为提高精度留出的净空。然而,在大数据时代,随着众包服务的可用性不断提高,以及在弱监督学习方面的最新进展,出现了缓解这一事实的新机会。有鉴于此,此贡献首先显示了在广泛的说话人分析任务中的数据可用性方面的事实上的标准。然后,它基于主动学习和半监督学习以及转移学习的结合,引入了高效的“合作”学习策略,以结合数据综合来最佳地利用可用数据。此外,由于该方法构成了弱监督学习算法的基础(一部分),因此建议了在这一领域中评估有意义的置信度的方法。此外,使用深度递归丰富的多目标学习(部分缺少标签信息),提出了进行整体语音分析的第一个成功方法。最后,展示了实现大数据处理所需的处理分布的步骤。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号