首页> 外文会议>IEEE Winter Conference on Applications of Computer Vision >Detecting the Starting Frame of Actions in Video
【24h】

Detecting the Starting Frame of Actions in Video

机译:检测视频中动作的开始帧

获取原文

摘要

In this work, we address the problem of precisely localizing key frames of an action, for example, the precise time that a pitcher releases a baseball, or the precise time that a crowd begins to applaud. Key frame localization is a largely overlooked and important action-recognition problem, for example in the field of neuroscience, in which we would like to understand the neural activity that produces the start of a bout of an action. To address this problem, we introduce a novel structured loss function that properly weights the types of errors that matter in such applications: it more heavily penalizes extra and missed action start detections over small misalignments. Our structured loss is based on the best matching between predicted and labeled action starts. We train recurrent neural networks (RNNs) to minimize differentiable approximations of this loss. To evaluate these methods, we introduce the Mouse Reach Dataset, a large, annotated video dataset of mice performing a sequence of actions. The dataset was collected and labeled by experts for the purpose of neuroscience research. On this dataset, we demonstrate that our method outperforms related approaches and baseline methods using an unstructured loss.
机译:在这项工作中,我们解决了精确定位动作关键帧的问题,例如,投手释放棒球的准确时间或人群开始鼓掌的精确时间。关键帧定位是一个在很大程度上被忽略且重要的动作识别问题,例如在神经科学领域,我们想要了解导致动作开始的神经活动。为了解决这个问题,我们引入了一种新颖的结构化损失函数,该函数适当地权衡了此类应用中重要的错误类型:它对较小的失准会更严厉地惩罚额外的和错过的动作开始检测。我们的结构性损失基于预测的动作和标记的动作开始之间的最佳匹配。我们训练递归神经网络(RNN),以最小化此损失的可区分近似值。为了评估这些方法,我们引入了Mouse Reach Dataset(鼠标到达数据集),这是一个大型的,带注释的执行操作序列的鼠标视频集。为了神经科学研究的目的,收集了该数据集并由专家进行了标记。在此数据集上,我们证明了我们的方法使用非结构化损失优于相关方法和基线方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号