首页> 外国专利> Method and device for extracting a visual feature vector from a sequence of images, and speech recognition system

Method and device for extracting a visual feature vector from a sequence of images, and speech recognition system

机译:从图像序列中提取视觉特征向量的方法和装置以及语音识别系统

摘要

In a facial feature extraction method and a device for carrying it out, the change in the light intensity of a front view of a speaker's face is used. The sequence of video data is scanned and quantised in a uniform pixel arrangement and form a coordinate system of scan lines and pixel positions. Left/right eye regions and the mouth are determined by the formation of thresholds of the pixel grey scale and finding the centroids of three regions. The line segment which connects the eye region centroids is bisected at a right angle in order to form an axis of symmetry. A straight line through the mouth region centroid forms the mouth line. Pixels along the mouth line and the axis of symmetry form a horizontal/vertical grey scale profile. Selected as feature vectors are maxima and minima of the profile which correspond to important physiological speech features such as lower/upper lip, mouth angle, mouth region positions. A speech recognition system uses the visual feature vector in combination with an accompanying acoustic vector as inputs to a time-delayed neural network.
机译:在面部特征提取方法及其实施装置中,使用了说话者面部的正视图的光强度的变化。视频数据序列以均匀的像素排列进行扫描和量化,并形成扫描线和像素位置的坐标系。左/右眼区域和嘴巴是由像素灰度阈值的形成并找到三个区域的质心来确定的。连接眼睛区域质心的线段以直角一分为二,以形成对称轴。穿过嘴部区域质心的直线形成嘴线。沿口线和对称轴的像素形成水平/垂直灰度轮廓。被选为特征向量的是轮廓的最大值和最小值,其对应于重要的生理语音特征,例如下/上唇,嘴角,嘴区域位置。语音识别系统将视觉特征向量与伴随的声学向量结合起来用作延时神经网络的输入。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号