首页> 外文会议>International Conference on Virtual Reality and Visualization >A Novel Audio-Oriented Learning Strategies for Character Recognition
【24h】

A Novel Audio-Oriented Learning Strategies for Character Recognition

机译:一种用于字符识别的小说面向音频学习策略

获取原文

摘要

In this paper, we propose a robust audio-oriented learning strategies to address the issue of character recognition in movie/TV-series. Identifying major characters in movies/TV-series has drawn researcher's great interests. Most of them have explored some character recognition and retrieval applications based on visual appearance, whereas visual appearance is inconsistent throughout the whole video. Our approach, mainly focusing on audio, features that: (i) we extract both spectral and temporal audio features of Mel-scale Frequency Cepstral Coefficients(MFCC), prosodic, average pause length, speaking rate features, pitch and short time energy, and also the complementarity of Gabor features, (ii) we adopt Multi-Task Joint Sparse Representation and Recognition (MTJSRC) model for learning with all the features except Gabor, and SVM model with Gabor features, (iii) regarding these original features as seeds, we extend the training set from talk shows with semi-supervise learning, (iv) the Conditional Random Field (CRF) model with consideration of the constrains in time sequence is introduced to enhance the final labelling. Finally, experimental results demonstrates the effectiveness performance of our approach.
机译:在本文中,我们提出了一个强大的音频为导向的学习策略,以解决字符识别的问题,在电影/电视系列。识别电影主要角色/ TV系列已引起研究者的极大兴趣。他们中的大多数已经探索了一些字符识别和基于视觉外观检索应用,而外观是在整个视频不一致。我们的做法,主要集中在音频,特色是:(i)我们提取梅尔频率倒谱系数(MFCC),韵律,平均停顿长,语速功能,音调和短时能量的光谱和时间音频功能,并同样的Gabor的互补特性,(二),我们采用多任务联合稀疏表示与识别(MTJSRC)模型与除的Gabor所有功能学习,SVM模型Gabor特征,(iii)关于这些原有特色种子,我们从脱口秀与扩展训练集半监督学习,(iv)与考虑按照时间顺序约束的条件随机场(CRF)模型引入以增强最终的标签标识。最后,实验结果证明了我们方法的有效性表现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号