首页> 外文会议>Pacific Rim Conference on Multimedia >HMM-Based Audio Keyword Generation
【24h】

HMM-Based Audio Keyword Generation

机译:基于赫姆的音频关键字生成

获取原文

摘要

With the exponential growth in the production creation of multimedia data, there is an increasing need for video semantic analysis. Audio, as a significant part of video, provides important cues to human perception when humans are browsing and understanding video contents. To detect semantic content by useful audio information, we introduce audio keywords which are sets of specific audio sounds related to semantic events. In our previous work, we designed a hierarchical Support Vector Machine (SVM) classifier for audio keyword identification. However, a weakness of our previous work is that audio signals are artificially segmented into 20 ms frames for frame-based SVM identification without any contextual information. In this paper, we propose a classification method based on Hidden Markov Modal (HMM) for audio keyword identification as an improved work instead of using hierarchical SVM classifier. Choosing HMM is motivated by the successful story of HMM in speech recognition. Unlike the frame-based SVM classification followed by major voting, our proposed HMM-based classifiers treat specific sound as a continuous time series data and employ hidden states transition to capture context information. In particular, we study how to find an effective HMM, i.e., determining topology, observation vectors and statistical parameters of HMM. We also compare different HMM structures with different hidden states, and adjust time series data with variable length. Experimental data includes 40 minutes basketball au-dio which comes from real-time sports games. Experimental results show that, for audio keyword generation, the proposed HMM-based method outperforms the previous hierarchical SVM.
机译:随着生产创建多媒体数据的指数增长,越来越需要视频语义分析。当人类正在浏览和了解视频内容时,音频是视频的重要组成部分,为人类感知提供了重要的提示。要通过有用的音频信息检测语义内容,我们介绍了与语义事件相关的特定音频声音集的音频关键字。在我们以前的工作中,我们设计了一个用于音频关键字标识的分层支持向量机(SVM)分类器。然而,我们以前的工作的弱点是音频信号在没有任何上下文信息的情况下为基于帧的SVM标识被人工地分割成20 ms帧。在本文中,我们提出了一种基于隐马尔可夫模态(HMM)的分类方法,用于音频关键字标识作为改进的工作而不是使用分层SVM分类器。选择嗯,由语音识别中的嗯的成功故事是激励的。与基于帧的SVM分类不同,随后是主要投票,我们提出的基于赫姆的分类器将特定的声音视为连续时间序列数据,并使用隐藏状态转换以捕获上下文信息。特别地,我们研究了如何找到有效的HMM,即确定拓扑,观察载体和HMM的统计参数。我们还将不同的HMM结构与不同的隐藏状态进行比较,并使用可变长度调整时间序列数据。实验数据包括来自实时体育游戏的40分钟篮球Au-Dio。实验结果表明,对于音频关键字生成,所提出的基于HMM的方法优于前一个分层SVM。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号