首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Adaptive Mid-Term Representations for Robust Audio Event Classification
【24h】

Adaptive Mid-Term Representations for Robust Audio Event Classification

机译:自适应中期表示,用于可靠的音频事件分类

获取原文
获取原文并翻译 | 示例

摘要

Low-level audio features are commonly used in many audio analysis tasks, such as audio scene classification or acoustic event detection. Due to the variable length of audio signals, it is a common approach to create fixed-length feature vectors consisting of a set of statistics that summarize the temporal variability of such short-term features. To avoid the loss of temporal information, the audio event can be divided into a set of mid-term segments or texture windows. However, such an approach requires to estimate accurately the onset and offset times of the audio events in order to obtain a robust mid-term statistical description of their temporal evolution. This paper proposes the use of an alternative event representation based on nonlinear time normalization prior to the extraction of mid-term statistics. The short-term features are transformed into a new fixed-length representation that considers uniform distance subsampling over a defined feature space in contrast to the classical short-term temporal framing. The results show that the use of distance-based texture windows provides an improved statistical description of the event robust to errors in the event segmentation stage under noisy conditions.
机译:低级音频功能通常用于许多音频分析任务,例如音频场景分类或声音事件检测。由于音频信号的长度可变,因此创建固定长度特征向量的常用方法是由一组统计量组成,这些统计量汇总了此类短期特征的时间变化。为了避免时间信息的丢失,可以将音频事件划分为一组中期段或纹理窗口。但是,这种方法需要准确估计音频事件的开始和偏移时间,以便获得对其时间演变的可靠的中期统计描述。本文提出了在提取中期统计数据之前使用基于非线性时间归一化的替代事件表示的方法。短期特征被转换为新的固定长度表示形式,与经典的短期时间取景相比,该表示形式考虑了已定义特征空间上的均匀距离二次采样。结果表明,基于距离的纹理窗口的使用提供了对事件增强的统计描述,该事件对嘈杂条件下事件分割阶段中的错误具有鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号