...
首页> 外文期刊>EURASIP journal on advances in signal processing >Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus
【24h】

Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus

机译:移动通话,独立于说话者的特征研究,以及使用CUAVE多模式语音语料库的基线结果

获取原文

摘要

Strides in computer technology and the search for deeper, more powerful techniques in signal processing have brought multimodal research to the forefront in recent years. Audio-visual speech processing has become an important part of this research because it holds great potential for overcoming certain problems of traditional audio-only methods. Difficulties, due to background noise and multiple speakers in an application environment, are significantly reduced by the additional information provided by visual features. This paper presents information on a new audio-visual database, a feature study on moving speakers, and on baseline results for the whole speaker group. Although a few databases have been collected in this area, none has emerged as a standard for comparison. Also, efforts to date have often been limited, focusing on cropped video or stationary speakers. This paper seeks to introduce a challenging audio-visual database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The Clemson University Audio-Visual Experiments (CUAVE) database is a speaker-independent corpus of both connected and continuous digit strings totaling over 7000 utterances. It contains a wide variety of speakers and is designed to meet several goals discussed in this paper. One of these goals is to allow testing of adverse conditions such as moving talkers and speaker pairs. A feature study of connected digit strings is also discussed. It compares stationary and moving talkers in a speaker-independent grouping. An image-processing-based contour technique, an image transform method, and a deformable template scheme are used in this comparison to obtain visual features. This paper also presents methods and results in an attempt to make these techniques more robust to speaker movement. Finally, initial baseline speaker-independent results are included using all speakers, and conclusions as well as suggested areas of research are given.
机译:近年来,计算机技术的飞速发展以及对信号处理的更深入,更强大的技术的探索使多模式研究走到了最前沿。视听语音处理已成为解决此研究的重要部分,因为它具有克服传统纯音频方法某些问题的巨大潜力。视觉功能提供的附加信息大大降低了由于背景噪声和应用环境中的多个扬声器造成的困难。本文介绍了有关新的视听数据库,有关移动发言人的功能研究以及整个发言人组的基线结果的信息。尽管已经收集了该领域的一些数据库,但是没有一个数据库可以作为比较的标准。此外,迄今为止,人们的努力通常受到限制,主要集中在裁剪后的视频或固定扬声器上。本文旨在介绍一个具有挑战性的视听数据库,该数据库灵活且相当全面,但对于研究人员而言,可以轻松地通过一张DVD获得。克莱姆森大学视听实验(CUAVE)数据库是独立于说话者的语料库,包括连接的和连续的数字字符串,总计超过7000种发音。它包含各种扬声器,旨在满足本文中讨论的多个目标。这些目标之一是允许测试不利条件,例如移动的讲话者和扬声器对。还讨论了连接数字串的特征研究。它在与说话者无关的分组中比较固定的和移动的讲话者。在此比较中,使用了基于图像处理的轮廓技术,图像变换方法和可变形模板方案来获得视觉特征。本文还介绍了一些方法,并试图使这些技术对扬声器的移动更加鲁棒。最后,包括所有说话者的初始基线不依赖说话者的结果,并给出结论和建议的研究领域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号