Person count localization in videos from noisy foreground and detections

Person count localization in videos from noisy foreground and detections




This paper formulates and presents a solution to a new problem called person count localization. Given a video of a crowded scene, our goal is to output for each frame a set of: 1) Detections optimally covering both isolated individuals and cluttered groups of people; and 2) Counts of people inside these detections. This problem is a middle-ground between frame-level person counting, which does not localize counts, and person detection aimed at perfectly localizing people with count-one detections. Our problem formulation is important for a wide range of domains, where people appear frequently under severe occlusion within a crowd. As these crowds are often visually distinct from the rest of the scene, they can be viewed as “visual phrases” whose spatially tight localization and count assignment could facilitate higher-level video understanding. For count localization, we specify a novel framework of iterative error-driven revisions of a flow graph derived from noisy input of people detections and foreground segmentation. Each iteration creates and solves an integer program for count localization based on iterative revisions of the flow graph. The graph revisions are based on detected violations of basic integrity constraints. They in turn trigger learned modifications to the graph aimed at reducing noise in input features. For evaluation, we introduce a new metric that measures both count precision and localization of our approach on American football and pedestrian videos.
机译:本文制定了一个调用人数定位的新问题的解决方案。给出了一个拥挤的场景的视频,我们的目标是为每一帧输出一组:1)最佳地检测隔离的个体和杂乱的人群; 2)这些检测内的人数。这个问题是帧级人数之间的中间地面,它没有本地化计数,人员检测旨在完全本地化具有计数检测的人。我们的问题配方对于广泛的域来说很重要,其中人们在人群中经常出现在严重的遮挡下。由于这些人群往往与场景的其余部分视觉上不同,因此可以被视为其空间紧密本地化和计数分配的“视觉短语”可以促进更高级别的视频理解。对于计算本地化,我们指定了一种新颖的迭代错误驱动修订的框架,其流程图源自人们检测和前景分段的噪声输入。根据流程图的迭代修订,每次迭代都会创建并解决整数程序以进行计数定位。图形修订是基于检测到的违反基本完整性约束的行为。它们反过来触发到旨在降低输入功能中噪声的图表的学习修改。为了评估,我们介绍了一种新的公制,衡量我们对美式足球和行人视频的方法的精确和本地化。



