OCR-aided person annotation and label propagation for speaker modeling in TV shows

机译：OCR辅助的人注释和标签传播，用于电视节目中的演讲者建模

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we present an approach for minimizing human effort in manual speaker annotation. Label propagation is used at each iteration of an active learning cycle. More precisely, a selection strategy for choosing the most suitable speech track to be labeled is proposed. Four different selection strategies are evaluated and all the tracks in a corresponding cluster are gathered using agglomerative clustering in order to propagate human annotations. To further reduce the manual labor required, an optical character recognition system is used to bootstrap annotations. At each step of the cycle, annotations are used to build speaker models. The quality of the generated speaker models is evaluated at each step using an i-vector based speaker identification system. The presented approach shows promising results on the REPERE corpus with a minimum amount of human effort for annotation.

机译：在本文中，我们提出了一种在手动说话者注释中最大程度减少人为努力的方法。在主动学习周期的每次迭代中都使用标签传播。更精确地，提出了用于选择最合适的要标记的语音轨道的选择策略。评估了四种不同的选择策略，并使用聚类聚类收集了相应聚类中的所有曲目，以便传播人类注释。为了进一步减少所需的体力劳动，使用光学字符识别系统来引导注释。在循环的每个步骤中，都使用注释来构建说话者模型。使用基于i-vector的说话者识别系统在每个步骤评估生成的说话者模型的质量。提出的方法在REPERE语料库上显示了令人鼓舞的结果，只需最少的人工注释即可。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2016年|5570-5574|共5页
会议地点
作者
Mateusz Budnik; Laurent Besacier; Ali Khodabakhsh; Cenk Demiroglu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
OCR; active learning; annotation propagation; clustering; speaker identification;

机译：OCR;主动学习;注释传播;聚类;说话人识别;
入库时间 2022-08-26 15:24:17

相似文献

外文文献
中文文献
专利

1. Person instance graphs for mono-, cross- and multi-modal person recognition in multimedia data: application to speaker identification in TV broadcast [J] . Hervé Bredin, Anindya Roy, Viet-Bac Le, International Journal of Multimedia Information Retrieval . 2014,第3期

机译：用于多媒体数据中的单模式，跨模式和多模式人员识别的人员实例图：在电视广播中的说话人识别中的应用
2. Manual Annotation And Automatic Image Processing Of Multimodal Emotional Behaviors: Validating The Annotation Of Tv Interviews [J] . J. -C. Martin, G. Caridakis, L. Devillers, Personal and Ubiquitous Computing . 2009,第1期

机译：多模态情感行为的手动注释和自动图像处理：验证电视采访的注释
3. Modelling of ultra high frequency television band radio signal propagation in underground mine environment [J] . Vujic Dejan S., Certic Jelena D. Wireless Networks . 2019,第4期

机译：地下矿井环境中超高频电视波段无线电信号传播的建模
4. OCR-aided person annotation and label propagation for speaker modeling in TV shows [C] . Mateusz Budnik, Laurent Besacier, Ali Khodabakhsh, IEEE International Conference on Acoustics, Speech and Signal Processing . 2016

机译：电视节目中扬声器建模的OCR-辅助人员注释和标签传播
5. Unsupervised speaker identification for TV news [D] . Woo, Daniel N. 2014

机译：电视新闻的无监督说话人识别
6. Propagation curves and coverage areas of digital terrestrial television base stations in the tropical zone [O] . A. Akinbolati, M.O. Ajewole, A.T. Adediji, 2020

机译：热带地区数字地面电视基站的传播曲线和覆盖区域
7. OCR-aided person annotation and label propagation for speaker modeling in TV shows [O] . Mateusz Budnik, Laurent Besacier, Ali Khodabakhsh, 2016

机译：电视节目中扬声器建模的OCR-辅助人员注释和标签传播

OCR-aided person annotation and label propagation for speaker modeling in TV shows

摘要

著录项

相似文献

相关主题

期刊订阅