首页> 外文会议>International Conference on Virtual Reality and Visualization >A Novel Audio-Oriented Learning Strategies for Character Recognition
【24h】

A Novel Audio-Oriented Learning Strategies for Character Recognition

机译:一种新颖的面向音频的字符识别学习策略

获取原文

摘要

In this paper, we propose a robust audio-oriented learning strategies to address the issue of character recognition in movie/TV-series. Identifying major characters in movies/TV-series has drawn researcher's great interests. Most of them have explored some character recognition and retrieval applications based on visual appearance, whereas visual appearance is inconsistent throughout the whole video. Our approach, mainly focusing on audio, features that: (i) we extract both spectral and temporal audio features of Mel-scale Frequency Cepstral Coefficients(MFCC), prosodic, average pause length, speaking rate features, pitch and short time energy, and also the complementarity of Gabor features, (ii) we adopt Multi-Task Joint Sparse Representation and Recognition (MTJSRC) model for learning with all the features except Gabor, and SVM model with Gabor features, (iii) regarding these original features as seeds, we extend the training set from talk shows with semi-supervise learning, (iv) the Conditional Random Field (CRF) model with consideration of the constrains in time sequence is introduced to enhance the final labelling. Finally, experimental results demonstrates the effectiveness performance of our approach.
机译:在本文中,我们提出了一种强大的面向音频的学习策略,以解决电影/电视剧中的字符识别问题。识别电影/电视剧中的主要人物引起了研究者的极大兴趣。他们中的大多数人已经基于视觉外观探索了一些字符识别和检索应用程序,而视觉外观在整个视频中并不一致。我们的方法主要集中在音频上,其特点是:(i)提取梅尔级频率倒谱系数(MFCC)的频谱和时间音频特征,韵律,平均停顿长度,语速特征,音调和短时能量,以及以及Gabor功能的互补性;(ii)我们采用多任务联合稀疏表示和识别(MTJSRC)模型来学习具有除Gabor之外的所有功能,以及具有Gabor功能的SVM模型,(iii)将这些原始特征视为种子,我们通过半监督学习从脱口秀节目中扩展训练集,(iv)考虑时间顺序约束的条件随机场(CRF)模型被引入以增强最终标记。最后,实验结果证明了我们方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号