首页> 外文期刊>International Journal of Computational Intelligence and Applications >VISUAL SPEECH RECOGNITION USING OPTICAL FLOW AND SUPPORT VECTOR MACHINES
【24h】

VISUAL SPEECH RECOGNITION USING OPTICAL FLOW AND SUPPORT VECTOR MACHINES

机译:使用光学流和支持矢量机的视觉识别

获取原文
获取原文并翻译 | 示例
           

摘要

A lip-reading technique that identifies visemes from visual data only and without evaluating the corresponding acoustic signals is presented. The technique is based on vertical components of the optical flow (OF) analysis and these are classified using support vector machines (SVM). The OF is decomposed into multiple non-overlapping fixed scale blocks and statistical features of each block are computed for successive video frames of an utterance. This technique performs automatic temporal segmentation (i.e., determining the start and the end of an utterance) of the utterances, achieved by pair-wise pixel comparison method, which evaluates the differences in intensity of corresponding pixels in two successive frames. The experiments were conducted on a database of 14 visemes taken from seven subjects and the accuracy tested using five and ten fold cross validation for binary and multiclass SVM respectively to determine the impact of subject variations. Unlike other systems in the literature, the results indicate that the proposed method is more robust to inter-subject variations with high sensitivity and specificity for 12 out of 14 visemes. Potential applications of such a system include human computer interface (HCI) for mobility-impaired users, lip reading mobile phones, in-vehicle systems, and improvement of speech based computer control in noisy environment.
机译:提出了一种唇读技术,该技术仅从视觉数据中识别视位素而不评估相应的声音信号。该技术基于光流(OF)分析的垂直分量,并使用支持向量机(SVM)对其进行分类。 OF被分解成多个不重叠的固定比例块,并且针对发声的连续视频帧计算每个块的统计特征。该技术执行通过成对像素比较方法实现的发声的自动时间分段(即,确定发声的开始和结束),该成对像素比较方法评估两个连续帧中相应像素的强度差异。实验是在从七个对象中选取的14个视位素的数据库上进行的,并分别对二进制和多类SVM使用五倍和十倍交叉验证对准确性进行了测试,以确定对象变异的影响。与文献中的其他系统不同,结果表明,对于14个视位中的12个视素,该方法对于对象间变异具有更高的灵敏度和特异性,因此更加健壮。这种系统的潜在应用包括用于行动不便的用户的人机界面(HCI),唇读手机,车载系统以及在嘈杂环境中基于语音的计算机控制的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号