Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus

Eric K. Patterson; Sabri Gurbuz; Zekeriya Tufekci; John N. Gowdy

首页> 外文期刊>EURASIP journal on advances in signal processing >Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus

【24h】

Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus

机译：移动通话，独立于说话者的特征研究，以及使用CUAVE多模式语音语料库的基线结果

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Strides in computer technology and the search for deeper, more powerful techniques in signal processing have brought multimodal research to the forefront in recent years. Audio-visual speech processing has become an important part of this research because it holds great potential for overcoming certain problems of traditional audio-only methods. Difficulties, due to background noise and multiple speakers in an application environment, are significantly reduced by the additional information provided by visual features. This paper presents information on a new audio-visual database, a feature study on moving speakers, and on baseline results for the whole speaker group. Although a few databases have been collected in this area, none has emerged as a standard for comparison. Also, efforts to date have often been limited, focusing on cropped video or stationary speakers. This paper seeks to introduce a challenging audio-visual database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The Clemson University Audio-Visual Experiments (CUAVE) database is a speaker-independent corpus of both connected and continuous digit strings totaling over 7000 utterances. It contains a wide variety of speakers and is designed to meet several goals discussed in this paper. One of these goals is to allow testing of adverse conditions such as moving talkers and speaker pairs. A feature study of connected digit strings is also discussed. It compares stationary and moving talkers in a speaker-independent grouping. An image-processing-based contour technique, an image transform method, and a deformable template scheme are used in this comparison to obtain visual features. This paper also presents methods and results in an attempt to make these techniques more robust to speaker movement. Finally, initial baseline speaker-independent results are included using all speakers, and conclusions as well as suggested areas of research are given.

机译：近年来，计算机技术的飞速发展以及对信号处理的更深入，更强大的技术的探索使多模式研究走到了最前沿。视听语音处理已成为解决此研究的重要部分，因为它具有克服传统纯音频方法某些问题的巨大潜力。视觉功能提供的附加信息大大降低了由于背景噪声和应用环境中的多个扬声器造成的困难。本文介绍了有关新的视听数据库，有关移动发言人的功能研究以及整个发言人组的基线结果的信息。尽管已经收集了该领域的一些数据库，但是没有一个数据库可以作为比较的标准。此外，迄今为止，人们的努力通常受到限制，主要集中在裁剪后的视频或固定扬声器上。本文旨在介绍一个具有挑战性的视听数据库，该数据库灵活且相当全面，但对于研究人员而言，可以轻松地通过一张DVD获得。克莱姆森大学视听实验（CUAVE）数据库是独立于说话者的语料库，包括连接的和连续的数字字符串，总计超过7000种发音。它包含各种扬声器，旨在满足本文中讨论的多个目标。这些目标之一是允许测试不利条件，例如移动的讲话者和扬声器对。还讨论了连接数字串的特征研究。它在与说话者无关的分组中比较固定的和移动的讲话者。在此比较中，使用了基于图像处理的轮廓技术，图像变换方法和可变形模板方案来获得视觉特征。本文还介绍了一些方法，并试图使这些技术对扬声器的移动更加鲁棒。最后，包括所有说话者的初始基线不依赖说话者的结果，并给出结论和建议的研究领域。

著录项

来源
《EURASIP journal on advances in signal processing 》 |2002年第11期| 共页
作者
Eric K. Patterson; Sabri Gurbuz; Zekeriya Tufekci; John N. Gowdy;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类通信 ;
关键词
audio-visual speech recognitionspeechreadingmultimodal database;

机译：视听语音识别语音朗读多模态数据库;

相似文献

外文文献
中文文献
专利

1. Arabic Speaker-Independent Continuous Automatic Speech Recognition Based on a Phonetically Rich and Balanced Speech Corpus [J] . Mohammad Abushariah, Raja Ainon, Roziati Zainuddin, The international arab journal of information technology . 2012 ,第1期

机译：基于语音丰富均衡的语料库的阿拉伯语独立于说话人的连续自动语音识别
2. Converging to the baseline: Corpus evidence for convergence in speech rate to interlocutor's baseline [J] . Priva Uriel Cohen, Edelist Lee, Gleason Emily The Journal of the Acoustical Society of America . 2017 ,第5aPta1期

机译：融合到基线：语料库证据与讲话者的基线进行语音率的融合
3. Converging to the baseline: Corpus evidence for convergence in speech rate to interlocutor's baseline [J] . Priva Uriel Cohen, Edelist Lee, Gleason Emily The Journal of the Acoustical Society of America . 2017 ,第5aPta2期

机译：融合到基线：语料库证据与讲话者的基线进行语音率的融合
4. A comparative study on phonological feature detection from continuous speech with respect to variable corpus size [C] . Tanmay Bhowmik, Krishna Dulal Dalapati, Shyamal Kumar Das Mandal IEEE Students' Technology Symposium . 2016

机译：可变语料库大小的连续语音语音特征检测的比较研究
5. Convolutional Neural Networks for Speaker-Independent Speech Recognition. [D] . Belilovsky, Eugene. 2011

机译：用于与说话人无关的语音识别的卷积神经网络。
6. SUST Bangla Emotional Speech Corpus (SUBESCO): An audio-only emotional speech corpus for Bangla [O] . Sadia Sultana, M. Shahidur Rahman, M. Reza Selim, 2021

机译：Sull Bangla情感语音语料库（Subesco）：孟加拉的一个音频情绪语音语音
7. Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus [O] . 2002

机译：移动通话，独立于说话者的特征研究，以及使用CUAVE多模式语音语料库的基线结果
8. Spire Based Speaker-Independent Continuous Speech Recognition Using Mixed Feature Sets [R] . Dawson, R. G. 1987

机译：基于混合特征集的基于尖端的扬声器无关连续语音识别

Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus

摘要

著录项

相似文献

相关主题

期刊订阅