...
首页> 外文期刊>Multimedia Tools and Applications >Acoustic event diarization in TV/movie audios using deep embedding and integer linear programming
【24h】

Acoustic event diarization in TV/movie audios using deep embedding and integer linear programming

机译:使用深度嵌入和整数线性编程的电视/电影音频中的声音事件二值化

获取原文
获取原文并翻译 | 示例
           

摘要

In this study, we propose a method for acoustic event diarization based on a feature of deep embedding and a clustering algorithm of integer linear programming. The deep embedding learned by deep auto-encoder network is used to represent the properties of different classes of acoustic events, and then the integer linear programming is adopted for merging audio segments belonging to the same class of acoustic events. Four kinds of TV/movie audios (21.5 h in total) are used as experimental data, including Sport, Situation comedy, Award ceremony, and Action movie. We compare the deep embedding with state-of-the-art features. Further, the clustering algorithm of integer linear programming is compared with other clustering algorithms adopted in previous works. Finally, the proposed method is compared to both supervised and unsupervised methods on four kinds of TV/movie audios. The results show that the proposed method is superior to other unsupervised methods based on agglomerative information bottleneck, Bayesian information criterion and spectral clustering, and is little inferior to the supervised method based on deep neural network in terms of acoustic event error.
机译:在这项研究中,我们提出了一种基于深度嵌入和整数线性规划聚类算法的声音事件数字化方法。通过深度自动编码器网络学习到的深度嵌入来表示不同类别的声音事件的属性,然后采用整数线性规划来合并属于同一类别声音事件的音频片段。实验数据使用四种电视/电影音频(总计21.5小时),包括体育,情景喜剧,颁奖典礼和动作电影。我们将深层嵌入与最新功能进行了比较。此外,将整数线性规划的聚类算法与先前工作中采用的其他聚类算法进行了比较。最后,将所提出的方法与四种电视/电影音频上的有监督和无监督方法进行了比较。结果表明,该方法优于基于聚集信息瓶颈,贝叶斯信息准则和频谱聚类的其他无监督方法,并且在声学事件误差方面不亚于基于深度神经网络的有监督方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号