首页> 外文会议>IEEE Conference on Computer Vision and Pattern Recognition >The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities
【24h】

The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities

机译:行动的语言:恢复针对目标的人类活动的语法和语义

获取原文
获取外文期刊封面目录资料

摘要

This paper describes a framework for modeling human activities as temporally structured processes. Our approach is motivated by the inherently hierarchical nature of human activities and the close correspondence between human actions and speech: We model action units using Hidden Markov Models, much like words in speech. These action units then form the building blocks to model complex human activities as sentences using an action grammar. To evaluate our approach, we collected a large dataset of daily cooking activities: The dataset includes a total of 52 participants, each performing a total of 10 cooking activities in multiple real-life kitchens, resulting in over 77 hours of video footage. We evaluate the HTK toolkit, a state-of-the-art speech recognition engine, in combination with multiple video feature descriptors, for both the recognition of cooking activities (e.g., making pancakes) as well as the semantic parsing of videos into action units (e.g., cracking eggs). Our results demonstrate the benefits of structured temporal generative approaches over existing discriminative approaches in coping with the complexity of human daily life activities.
机译:本文介绍了一种将人类活动建模为时间结构化过程的框架。我们的方法是由人类活动固有的层次性以及人类行为与语音之间的紧密对应关系所激发的:我们使用隐马尔可夫模型对动作单元进行建模,就像语音中的单词一样。然后,这些动作单元构成了构建模块,从而使用动作语法将复杂的人类活动建模为句子。为了评估我们的方法,我们收集了一个庞大的日常烹饪活动数据集:该数据集包括总共52位参与者,每个参与者在多个现实厨房中总共执行了10次烹饪活动,从而产生了77个小时以上的视频片段。我们结合最先进的语音识别引擎HTK工具包和多个视频特征描述符进行评估,以识别烹饪活动(例如制作煎饼)以及将视频语义解析为动作单元(例如,开裂的鸡蛋)。我们的结果证明,在应对人类日常生活活动的复杂性方面,结构化的时间生成方法优于现有的判别方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号