首页> 外文期刊>Pattern recognition letters >Semantic Object Classes In Video: A High-definition Ground Truth Database
【24h】

Semantic Object Classes In Video: A High-definition Ground Truth Database

机译:视频中的语义对象类:高清地面真相数据库

获取原文
获取原文并翻译 | 示例

摘要

Visual object analysis researchers are increasingly experimenting with video, because it is expected that motion cues should help with detection, recognition, and other analysis tasks. This paper presents the Cambridge-driving Labeled Video Database (CamVid) as the first collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes.rnThe database addresses the need for experimental data to quantitatively evaluate emerging algorithms. While most videos are filmed with fixed-position CCTV-style cameras, our data was captured from the perspective of a driving automobile. The driving scenario increases the number and heterogeneity of the observed object classes. Over 10 min of high quality 30 Hz footage is being provided, with corresponding semantically labeled images at 1 Hz and in part, 15 Hz.rnThe CamVid Database offers four contributions that are relevant to object analysis researchers. First, the per-pixel semantic segmentation of over 700 images was specified manually, and was then inspected and confirmed by a second person for accuracy. Second, the high-quality and large resolution color video images in the database represent valuable extended duration digitized footage to those interested in driving scenarios or ego-motion. Third, we filmed calibration sequences for the camera color response and intrinsics, and computed a 3D camera pose for each frame in the sequences. Finally, in support of expanding this or other databases, we present custom-made labeling software for assisting users who wish to paint precise class-labels for other images and videos. We evaluate the relevance of the database by measuring the performance of an algorithm from each of three distinct domains: multi-class object recognition, pedestrian detection, and label propagation.
机译:视觉对象分析研究人员正在越来越多地尝试视频,因为可以预期运动提示应有助于检测,识别和其他分析任务。本文将剑桥驾驶标签视频数据库(CamVid)展示为第一个带有对象类语义标签以及元数据的视频集合。该数据库提供了将每个像素与32个语义类别之一相关联的地面真相标签。rn该数据库满足了对实验数据进行定量评估新兴算法的需求。虽然大多数视频都是使用固定位置CCTV式摄像机拍摄的,但我们的数据是从驾驶汽车的角度捕获的。驾驶场景会增加观察到的对象类别的数量和异构性。提供了超过10分钟的高质量30 Hz素材,并提供了1 Hz且部分为15 Hz的相应语义标记图像。CamVid数据库提供了与对象分析研究人员相关的四项贡献。首先,手动指定700多个图像的每像素语义分割,然后由第二个人检查并确认准确性。其次,数据库中的高质量和高分辨率彩色视频图像代表了对驾驶场景或自我运动感兴趣的人的宝贵的持续时间数字化素材。第三,我们拍摄了相机色彩响应和内部特性的校准序列,并为序列中的每个帧计算了3D相机姿态。最后,为了支持扩展此数据库或其他数据库,我们提供了定制的标签软件,以帮助希望为其他图像和视频绘制准确的类别标签的用户。我们通过测量来自三个不同领域中每个领域的算法的性能来评估数据库的相关性:多类对象识别,行人检测和标签传播。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号