【24h】

A Data Reduction Procedure for 'Principal Cast' and other 'Talking Head' Detection

机译:用于“主要演员”和其他“谈话头”检测的数据缩减程序

获取原文
获取原文并翻译 | 示例

摘要

We describe a technique for reducing the data set for "principal cast" and other "talking head" detection in broadcast news content using the spatial attributes of MPEG-7 Motion Activity descriptor. The fact that these descriptors are easy to extract from compressed domain and also work well when used for matching talking head sequences, motivated us to utilize them for rapidly pruning the data set for subsequent sophisticated face detection techniques. We are thus able to speed up the process of finding the "principal cast" from broadcast news content by reducing the number of segments on which computationally more expensive face detection and recognition is employed. We present the experimental results of three clustering procedures using these descriptors on news content. The first procedure is based on a single template obtained from the centroid of ground truth set and is computationally less expensive. The second clustering procedure is based on multiple templates, which are the mean feature vectors of the component Gaussians of a Gaussian Mixture Model (GMM) trained best to fit the training data. The third clustering procedure is based on a HMM which captures the temporal interaction between the multiple templates. We are able to save 50% on computation measured in terms of number of rejected shots to total number of shots while missing 25% of talking head shots in the news program. We also observe that the second and third clustering procedure while being slightly computationally intensive allows for higher pruning factors with more accuracy.
机译:我们描述了一种使用MPEG-7运动活动描述符的空间属性来减少广播新闻内容中用于“主要演员”和其他“会说话的人”检测的数据集的技术。这些描述符易于从压缩域中提取,并且在用于匹配发声头序列时也能很好地工作,这一事实促使我们利用它们来快速修剪数据集,以用于后续复杂的面部检测技术。因此,通过减少在计算上更昂贵的面部检测和识别所使用的片段的数量,我们能够加快从广播新闻内容中查找“主要演员”的过程。我们介绍了在新闻内容上使用这些描述符的三个聚类过程的实验结果。第一个过程基于从地面真值集的质心获得的单个模板,并且在计算上更便宜。第二种聚类过程基于多个模板,这些模板是训练最适合训练数据的高斯混合模型(GMM)的成分高斯的平均特征向量。第三个聚类过程基于HMM,该HMM捕获多个模板之间的时间交互。在新闻节目中,我们可以节省50%的计算(以拒绝的镜头数量计),而在新闻节目中却少了25%的有说服力的镜头。我们还观察到,第二和第三类聚类过程虽然计算量稍大,但可以以更高的精度获得更高的修剪因子。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号