Using the MGB-2 challenge data for creating a new multimodal Dataset for speaker role recognition in Arabic TV Broadcasts

Mohamed Lazhar Bellagha; Mounir Zrigui

首页> 外文期刊>Procedia Computer Science >Using the MGB-2 challenge data for creating a new multimodal Dataset for speaker role recognition in Arabic TV Broadcasts

【24h】

Using the MGB-2 challenge data for creating a new multimodal Dataset for speaker role recognition in Arabic TV Broadcasts

机译：使用MGB-2挑战数据，用于在阿拉伯语电视广播中创建新的多模式数据集进行扬声器角色识别

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speaker role recognition is an important component in multimedia analysis for applications such as speaker naming, speaker diarization and video summarization. The lack of labeled datasets for this task has constrained algorithm evaluations. In this paper, we present a new multimodal dataset for speaker role recognition in Arabic TV programs. The dataset is artificially created using data provided by the Multi-Genre Broadcast challenge dataset. We also describe our algorithm for the processing and creation of speaker segments and their corresponding transcripts from audio documents. The spoken transcript and the speaker segments are automatically annotated for their speaker role of presenter, reporter, or a guest speaker. Based on these artificial annotations, we demonstrate for the speaker role labeling the importance of taking into account multimodal information for predicting speaker role. We present a monomodal and multimodal speaker role recognition approaches on speaker segments mined from television programs, with audio and textual classification baselines over a three-way speaker role labeling of presenter, reporter and guest.

机译：演讲者角色识别是扬声器命名，扬声器日益化和视频摘要等应用程序的多媒体分析的重要组成部分。此任务缺少标记的数据集具有约束算法评估。在本文中，我们在阿拉伯语电视节目中展示了一个用于发言者角色识别的新多峰数据集。使用由多类型广播挑战数据集提供的数据为人工创建的数据集。我们还描述了我们的处理和创建扬声器段的算法及其来自音频文档的相应成绩单。口头转录人和扬声器段自动注释主持人，记者或访客员工的发言者作用。基于这些人工注释，我们向演讲者角色展示了标记考虑到预测发言者角色的多模式信息的重要性。我们在电视节目中开采的扬声器段中提出了一名单态和多模式发言者角色识别方法，通过三方扬声器角色标签，记者，记者和客人提供音频和文本分类基准。

著录项

来源
《Procedia Computer Science》 |2021年第a期|共10页
作者
Mohamed Lazhar Bellagha; Mounir Zrigui;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Speaker role recognitionMultimodal DatasetMultimodal learningArabic Multi-Genre Broadcast.;

机译：发言者角色识别Multimodal DataSet Multimodal LiscientAlabic多类型广播。;

相似文献

外文文献
中文文献
专利

1. Person instance graphs for mono-, cross- and multi-modal person recognition in multimedia data: application to speaker identification in TV broadcast [J] . Hervé Bredin, Anindya Roy, Viet-Bac Le, International Journal of Multimedia Information Retrieval . 2014,第3期

机译：用于多媒体数据中的单模式，跨模式和多模式人员识别的人员实例图：在电视广播中的说话人识别中的应用
2. Creating a Multitrack Classical Music Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications [J] . Li Bochen, Liu Xinzhao, Dinesh Karthik, IEEE transactions on multimedia . 2019,第2期

机译：创建用于多模式音乐分析的多轨古典音乐演奏数据集：挑战，见解和应用
3. Dataset for the analysis of TV viewer response to live sport broadcasts and sponsor messages [J] . Christoph Breuer, Felix Boronczyk, Christopher Rumpf Data in Brief . 2021,第a期

机译：数据集分析电视查看者对现场运动广播和赞助邮件的回应
4. ALIF: A dataset for Arabic embedded text recognition in TV broadcast [C] . Yousfi Sonia, Berrani Sid-Ahmed, Garcia Christophe International Conference on Document Analysis and Recognition . 2015

机译：ALIF：电视广播中用于阿拉伯语嵌入文本识别的数据集
5. Fine-grained Activity Recognition Using Multimodal Datasets [D] . Song, Young Chol. 2016

机译：使用多模式数据集的细粒度活动识别
6. The Automatic Detection of Chronic Pain-Related Expression: Requirements Challenges and the Multimodal EmoPain Dataset [O] . Min S. H. Aung, Sebastian Kaltwang, Bernardino Romera-Paredes, -1

机译：慢性疼痛相关表达的自动检测：需求挑战和多模式EmoPain数据集
7. The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition [O] . Ali, Ahmed, Bell, Peter, Glass, James, 2017

机译：mGB-2挑战：阿拉伯语多方言广播媒体识别
8. Metadata Wizard: An Easy-to-Use Tool for Creating FGDC-CSDGM Metadata for Geospatial Datasets in ESRI ArcGIS Desktop . [R] . Ignizio, D. A., O'Donnell, M. S., Talbert, C. B. 2014

机译：元数据向导：一种易于使用的工具，用于在EsRI arcGIs Desktop中为地理空间数据集创建FGDC-CsDGm元数据。

Using the MGB-2 challenge data for creating a new multimodal Dataset for speaker role recognition in Arabic TV Broadcasts

摘要

著录项

相似文献

相关主题

期刊订阅