首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol
【24h】

Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol

机译:视频中人脸,文本以及车辆检测和跟踪的性能评估框架:数据,指标和协议

获取原文
获取原文并翻译 | 示例
       

摘要

Common benchmark data sets, standardized performance metrics, and baseline algorithms have demonstrated considerable impact on research and development in a variety of application domains. These resources provide both consumers and developers of technology with a common framework to objectively compare the performance of different algorithms and algorithmic improvements. In this paper, we present such a framework for evaluating object detection and tracking in video: specifically for face, text, and vehicle objects. This framework includes the source video data, ground-truth annotations (along with guidelines for annotation), performance metrics, evaluation protocols, and tools including scoring software and baseline algorithms. For each detection and tracking task and supported domain, we developed a 50-clip training set and a 50-clip test set. Each data clip is approximately 2.5 minutes long and has been completely spatially/temporally annotated at the I-frame level. Each task/domain, therefore, has an associated annotated corpus of approximately 450,000 frames. The scope of such annotation is unprecedented and was designed to begin to support the necessary quantities of data for robust machine learning approaches, as well as a statistically significant comparison of the performance of algorithms. The goal of this work was to systematically address the challenges of object detection and tracking through a common evaluation framework that permits a meaningful objective comparison of techniques, provides the research community with sufficient data for the exploration of automatic modeling techniques, encourages the incorporation of objective evaluation into the development process, and contributes useful lasting resources of a scale and magnitude that will prove to be extremely useful to the computer vision research community for years to come.
机译:常见的基准数据集,标准化的性能指标和基准算法已证明对各种应用程序领域的研发产生了相当大的影响。这些资源为技术的消费者和开发人员提供了一个通用框架,可以客观地比较不同算法和算法改进的性能。在本文中,我们提出了一种评估视频中物体检测和跟踪的框架:专门针对面部,文本和车辆物体。该框架包括源视频数据,真实注释(以及注释准则),性能指标,评估协议以及包括评分软件和基准算法在内的工具。对于每个检测和跟踪任务以及受支持的域,我们开发了一个50夹训练集和一个50夹测试集。每个数据片段的长度约为2.5分钟,并已在I帧级别完全在空间/时间上进行了注释。因此,每个任务/域具有大约450,000帧的关联的带注释的语料库。这种注释的范围是空前的,旨在开始支持鲁棒的机器学习方法所需的数据量,以及算法性能的统计显着性比较。这项工作的目的是通过一个通用的评估框架来系统地解决对象检测和跟踪的挑战,该评估框架允许对技术进行有意义的客观比较,为研究团体提供足够的数据以探索自动建模技术,并鼓励纳入目标在开发过程中进行评估,并提供有用的持久资源,其规模和规模在以后的几年中将被证明对计算机视觉研究界极为有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号