首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >OCR-aided person annotation and label propagation for speaker modeling in TV shows
【24h】

OCR-aided person annotation and label propagation for speaker modeling in TV shows

机译:OCR辅助的人注释和标签传播,用于电视节目中的演讲者建模

获取原文

摘要

In this paper, we present an approach for minimizing human effort in manual speaker annotation. Label propagation is used at each iteration of an active learning cycle. More precisely, a selection strategy for choosing the most suitable speech track to be labeled is proposed. Four different selection strategies are evaluated and all the tracks in a corresponding cluster are gathered using agglomerative clustering in order to propagate human annotations. To further reduce the manual labor required, an optical character recognition system is used to bootstrap annotations. At each step of the cycle, annotations are used to build speaker models. The quality of the generated speaker models is evaluated at each step using an i-vector based speaker identification system. The presented approach shows promising results on the REPERE corpus with a minimum amount of human effort for annotation.
机译:在本文中,我们提出了一种在手动说话者注释中最大程度减少人为努力的方法。在主动学习周期的每次迭代中都使用标签传播。更精确地,提出了用于选择最合适的要标记的语音轨道的选择策略。评估了四种不同的选择策略,并使用聚类聚类收集了相应聚类中的所有曲目,以便传播人类注释。为了进一步减少所需的体力劳动,使用光学字符识别系统来引导注释。在循环的每个步骤中,都使用注释来构建说话者模型。使用基于i-vector的说话者识别系统在每个步骤评估生成的说话者模型的质量。提出的方法在REPERE语料库上显示了令人鼓舞的结果,只需最少的人工注释即可。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号