From pixels to gestures: learning visual representations for human analysis in color and depth data sequences

Antonio Hernandez-Vela

首页> 外文期刊>Electronic Letters on Computer Vision and Image Analysis: ELCVIA >From pixels to gestures: learning visual representations for human analysis in color and depth data sequences

【24h】

From pixels to gestures: learning visual representations for human analysis in color and depth data sequences

机译：从像素到手势：学习视觉表示以进行颜色和深度数据序列中的人体分析

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The visual analysis of humans from images is an important?topic of interest due to ?its relevance to many computer vision applications likepedestrian detection, monitoring and surveillance, human-computer?interaction, e-health or content-based image retrieval, among others. In this dissertation we are interested in learning different visual?representations of the human body that are helpful for?the visual analysis of humans in images and video sequences.?To that end, we analyze both RGB and depth image modalities ?and address the problem from three different research lines, ?at different levels of abstraction; from pixels to gestures:?human segmentation, human?pose estimation and gesture recognition. First, we show how binary segmentation (object vs. background) ?of the human body in image sequences is helpful to remove all the background clutter?present in the scene. The presented method, based on Graph cuts optimization,?enforces spatio-temporal consistency of the produced segmentation masks?among consecutive frames. Secondly, we present a framework for multi-label segmentation?for obtaining much more detailed segmentation masks: instead of just obtaining?a binary representation separating the human body from the background,?finer segmentation masks can be obtained separating the different body parts. At a higher level of abstraction, we?aim for a simpler yet descriptive representation?of the human body. Human pose estimation methods?usually rely on skeletal models of the human?body, formed by segments (or rectangles) that represent the?body limbs, appropriately connected following the?kinematic constraints of the human body.?In practice, such skeletal models must?fulfill some constraints in order to allow ?for efficient inference, while actually limiting?the expressiveness of the model.?In order to cope with this, we introduce a top-down approach for predicting the position?of the body parts in the model, using a mid-level part representation?based on Poselets. Finally, we propose a framework for gesture recognition based on the?bag of visual words framework. We leverage the benefits of RGB and depth?image modalities by combining modality-specific visual vocabularies in?a late fusion fashion. A new rotation-variant depth descriptor is presented,?yielding better results than other state-of-the-art descriptors.?Moreover, spatio-temporal pyramids are used to ?encode rough spatial and temporal structure.?In addition, we present a probabilistic reformulation of Dynamic Time Warping?for gesture segmentation in video sequences. A Gaussian-based?probabilistic model of a gesture is learnt, implicitly encoding?possible deformations in both spatial and time domains.

机译：从图像对人类进行视觉分析是一个重要的主题，因为它与许多计算机视觉应用（如行人检测，监视和监视，人机交互，电子医疗或基于内容的图像检索等）相关。在这篇论文中，我们有兴趣学习对人体的各种视觉表示，这有助于对图像和视频序列中的人类进行视觉分析。为此，我们分析了RGB和深度图像模态，并解决了这个问题。来自三个不同的研究领域，处于不同的抽象水平;从像素到手势：“人体分割，人体姿势估计和手势识别”。首先，我们展示图像序列中人体的二值分割（对象与背景）如何有助于消除场景中存在的所有背景杂波。提出的方法基于图割优化，可增强连续帧之间生成的分割蒙版的时空一致性。其次，我们提出了一种多标签分割的框架-用于获取更详细的分割蒙版：不仅可以获取将人体与背景分离的二进制表示，还可以使用更精细的分割蒙版来分离不同的身体部位。在更高的抽象水平上，我们旨在对人体进行更简单但更具描述性的表示。人体姿势估计方法通常依赖于人体的骨骼模型，骨骼模型是由代表人体四肢的部分（或矩形）组成，并根据人体的运动学约束进行适当连接。在实践中，此类骨骼模型必须“满足一些约束条件以允许有效的推理，而实际上限制了模型的表达性。”为了解决这个问题，我们引入了一种自顶向下的方法来预测模型中身体部位的位置。，使用基于Poselets的中级零件表示法。最后，我们提出了一种基于视觉单词袋框架的手势识别框架。我们通过以后期融合的方式组合特定于形式的视觉词汇来利用RGB和深度图像形式的优势。提出了一种新的旋转变量深度描述符，其结果要比其他最新的描述符更好。此外，时空金字塔被用来对大致的时空结构进行编码。动态时间扭曲的概率重新表示形式，用于视频序列中的手势分割。学习基于手势的高斯概率模型，隐式编码空间和时域中可能的变形。

著录项

来源
《Electronic Letters on Computer Vision and Image Analysis: ELCVIA》 |2015年第3期|共页
作者
Antonio Hernandez-Vela;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Gesture recognition using a bioinspired learning architecture that integrates visual data with somatosensory data from stretchable sensors [J] . Wang Ming, Yan Zheng, Wang Ting, Nature Electronics . 2020,第9期

机译：手势识别使用BioInspired学习架构，该架构将视觉数据与来自可拉伸传感器的躯体感应数据集成
2. Probability-based Dynamic Time Warping and Bag-of-Visual-and-Depth-Words for Human Gesture Recognition in RGB-D [J] . Antonio Hernandez-Vela, Miguel Angel Bautista, Xavier Perez-Sala, Pattern recognition letters . 2014,第deca1期

机译：RGB-D中基于概率的动态时间规整和视觉和深度词袋用于手势识别
3. Data analysis in visual power line inspection: An in-depth review of deep learning for component detection and fault diagnosis [J] . Liu Xinyu, Miao Xiren, Jiang Hao, Annual Review in Control . 2020,第1期

机译：视觉电力线检测中的数据分析：对组件检测和故障诊断深度学习的深入综述
4. Pixel-oriented Visualization Technique for Exploration of Data-flow: Application to the Analysis of Binary Files and Genomic Sequences [C] . Alain Giron, Patrick Deschavanne, Andrew Todd Pokropek, Society of Photo-Optical Instrumentation Engineers Conference on Visual Data Exploration and Analysis . 2001

机译：数据流探索的面向像素的可视化技术：应用于二进制文件和基因组序列的分析
5. Joint Optimization of Manifold Learning and Sparse Representations for Face and Gesture Analysis. [D] . Ptucha, Raymond. 2013

机译：流形学习和稀疏表示的联合优化，用于人脸和手势分析。
6. Multivariate Analysis of BOLD Activation Patterns Recovers Graded Depth Representations in Human Visual and Parietal Cortex [O] . Margaret Henderson, Vy Vo, Chaipat Chunharas, 2019

机译：大胆激活模式的多元分析可恢复人类视觉和顶叶皮层的深度表征
7. From pixels to gestures: learning visual representations for human analysis in color and depth data sequences [O] . Hernández-Vela Antonio 2015

机译：从像素到手势：学习视觉表示以进行颜色和深度数据序列中的人体分析

From pixels to gestures: learning visual representations for human analysis in color and depth data sequences

摘要

著录项

相似文献

相关主题

期刊订阅