Acoustic event diarization in TV/movie audios using deep embedding and integer linear programming

Li Yanxiong; Zhang Yuhan; Li Xianku; Liu Mingle; Wang Wucheng; Yang Jichen

首页> 外文期刊>Multimedia Tools and Applications >Acoustic event diarization in TV/movie audios using deep embedding and integer linear programming

【24h】

Acoustic event diarization in TV/movie audios using deep embedding and integer linear programming

机译：使用深度嵌入和整数线性编程的电视/电影音频中的声音事件二值化

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this study, we propose a method for acoustic event diarization based on a feature of deep embedding and a clustering algorithm of integer linear programming. The deep embedding learned by deep auto-encoder network is used to represent the properties of different classes of acoustic events, and then the integer linear programming is adopted for merging audio segments belonging to the same class of acoustic events. Four kinds of TV/movie audios (21.5 h in total) are used as experimental data, including Sport, Situation comedy, Award ceremony, and Action movie. We compare the deep embedding with state-of-the-art features. Further, the clustering algorithm of integer linear programming is compared with other clustering algorithms adopted in previous works. Finally, the proposed method is compared to both supervised and unsupervised methods on four kinds of TV/movie audios. The results show that the proposed method is superior to other unsupervised methods based on agglomerative information bottleneck, Bayesian information criterion and spectral clustering, and is little inferior to the supervised method based on deep neural network in terms of acoustic event error.

机译：在这项研究中，我们提出了一种基于深度嵌入和整数线性规划聚类算法的声音事件数字化方法。通过深度自动编码器网络学习到的深度嵌入来表示不同类别的声音事件的属性，然后采用整数线性规划来合并属于同一类别声音事件的音频片段。实验数据使用四种电视/电影音频（总计21.5小时），包括体育，情景喜剧，颁奖典礼和动作电影。我们将深层嵌入与最新功能进行了比较。此外，将整数线性规划的聚类算法与先前工作中采用的其他聚类算法进行了比较。最后，将所提出的方法与四种电视/电影音频上的有监督和无监督方法进行了比较。结果表明，该方法优于基于聚集信息瓶颈，贝叶斯信息准则和频谱聚类的其他无监督方法，并且在声学事件误差方面不亚于基于深度神经网络的有监督方法。

著录项

来源
《Multimedia Tools and Applications》 |2019年第23期|33999-34025|共27页
作者
Li Yanxiong; Zhang Yuhan; Li Xianku; Liu Mingle; Wang Wucheng; Yang Jichen;
展开▼
作者单位

South China Univ Technol Sch Elect & Informat Engn Room 223 Shaw Sci Bldg 381 Wushan Rd Guangzhou 510640 Guangdong Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Deep embedding; integer linear programming; acoustic event detection; audio content analysis;

机译：深度嵌入;整数线性规划;声音事件检测;音频内容分析;

相似文献

外文文献
中文文献
专利

1. Archival Television Audio: Surviving Television Broadcast Sound Tracks Representing Lost TV Programs (1946-1972). [J] . Gries Phil ARSC Journal . 2010,第2期

机译：档案电视音频：代表丢失的电视节目的尚存的电视广播音轨（1946-1972）。
2. AXS TV Adds FS4 to Boost Audio, Fiber and Embed/De-Embed Support in Broadcast Truck Loaded with AJA Pro Kit [J] . TV technology . 2017,第3Appa期

机译：AXS TV将FS4添加到增强AJA Pro套件的广播卡车的音频，光纤和嵌入/去嵌入支持中
3. Using multi-stream hierarchical deep neural network to extract deep audio feature for acoustic event detection [J] . Li Yanxiong, Zhang Xue, Jin Hai, Multimedia Tools and Applications . 2018,第1期

机译：使用多流分层深度神经网络提取深度音频特征以进行声音事件检测
4. Integer Linear Programming for Speaker Diarization and Cross-Modal Identification in TV Broadcast [C] . Hervé Bredin, Johann Poignant Conference of the International Speech Communication Association . 2013

机译：电视广播中扬声器日记和交叉模态识别的整数线性规划
5. The cultural translation of U.S. television programs and movies: Subtitle groups as cultural brokers in China. [D] . Hsiao, Chi-hua. 2014

机译：美国电视节目和电影的文化翻译：作为中国文化经纪人的字幕组织。
6. Audiovisual infotainment in European news: A comparative content analysis of Dutch Spanish and Irish television news programs [O] . Amanda Alencar, Sanne Kruikemeier -1

机译：欧洲新闻中的视听信息娱乐：对荷兰西班牙和爱尔兰电视新闻节目的比较内容分析
7. Audiovisual, Genre, Neural and Topical Textual Embeddings for TV Programme Content Representation [O] . Saba Nazir, Taner Cagali, Mehrnoosh Sadrzadeh, 2020

机译：视听，流派，神经和局部文本嵌入电视节目内容表示

Acoustic event diarization in TV/movie audios using deep embedding and integer linear programming

摘要

著录项

相似文献

相关主题

期刊订阅