首页> 外文期刊>Procedia Computer Science >Using the MGB-2 challenge data for creating a new multimodal Dataset for speaker role recognition in Arabic TV Broadcasts
【24h】

Using the MGB-2 challenge data for creating a new multimodal Dataset for speaker role recognition in Arabic TV Broadcasts

机译:使用MGB-2挑战数据,用于在阿拉伯语电视广播中创建新的多模式数据集进行扬声器角色识别

获取原文
           

摘要

Speaker role recognition is an important component in multimedia analysis for applications such as speaker naming, speaker diarization and video summarization. The lack of labeled datasets for this task has constrained algorithm evaluations. In this paper, we present a new multimodal dataset for speaker role recognition in Arabic TV programs. The dataset is artificially created using data provided by the Multi-Genre Broadcast challenge dataset. We also describe our algorithm for the processing and creation of speaker segments and their corresponding transcripts from audio documents. The spoken transcript and the speaker segments are automatically annotated for their speaker role of presenter, reporter, or a guest speaker. Based on these artificial annotations, we demonstrate for the speaker role labeling the importance of taking into account multimodal information for predicting speaker role. We present a monomodal and multimodal speaker role recognition approaches on speaker segments mined from television programs, with audio and textual classification baselines over a three-way speaker role labeling of presenter, reporter and guest.
机译:演讲者角色识别是扬声器命名,扬声器日益化和视频摘要等应用程序的多媒体分析的重要组成部分。此任务缺少标记的数据集具有约束算法评估。在本文中,我们在阿拉伯语电视节目中展示了一个用于发言者角色识别的新多峰数据集。使用由多类型广播挑战数据集提供的数据为人工创建的数据集。我们还描述了我们的处理和创建扬声器段的算法及其来自音频文档的相应成绩单。口头转录人和扬声器段自动注释主持人,记者或访客员工的发言者作用。基于这些人工注释,我们向演讲者角色展示了标记考虑到预测发言者角色的多模式信息的重要性。我们在电视节目中开采的扬声器段中提出了一名单态和多模式发言者角色识别方法,通过三方扬声器角色标签,记者,记者和客人提供音频和文本分类基准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号