DDaNet: Dual-Path Depth-Aware Attention Network for Fingerspelling Recognition Using RGB-D Images

Shih-Hung Yang; Wei-Ren Chen; Wun-Jhu Huang; Yon-Ping Chen

首页> 外文期刊>Quality Control, Transactions >DDaNet: Dual-Path Depth-Aware Attention Network for Fingerspelling Recognition Using RGB-D Images

【24h】

DDaNet: Dual-Path Depth-Aware Attention Network for Fingerspelling Recognition Using RGB-D Images

机译：DDANET：使用RGB-D图像的双路径深度感知注意网络，用于指尖识别

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Automatic fingerspelling recognition aims to overcome communication barriers between people who are deaf and those who can hear. RGB-D cameras are widely used to handle finger occlusion, which usually hinders fingerspelling recognition. However, color-depth misalignment, which is an intrinsic property of RGB-D cameras, hinders the simultaneous processing of color and depth images in the absence of intrinsic parameters of the camera. Furthermore, fine-grained hand gestures performed by various persons and captured from multiple views render the discriminative feature extraction difficult, due to intra-class variability and inter-class similarity. Inspired by the human visual mechanism, we propose a network to learn discriminative features related to fine-grained hand gestures while suppressing the effect of color–depth misalignment. Unlike existing approaches that independently process RGB-D images, a dual-path depth-aware attention network that learns a fingerspelling representation in separate RGB and depth paths, and progressively fuses the features learned from the two paths is proposed. As the hand is usually the closest object to the camera, depth information can contribute to emphasize the key fingers related to a letter sign. Thus, we develop a depth-aware attention module (DAM) to exploit spatial relations in the depth feature maps, refining the RGB and depth feature maps across a bottleneck structure. The module establishes a lateral connection of the RGB and depth paths and provides a depth-aware salient map to both paths. The experimental results demonstrated that the proposed network improved the accuracy (+0.83%) and

$F$

score (+1.55%) compared to state-of-the-art methods on a publicly available fingerspelling dataset. The visualization of the network processes demonstrates that the DAM facilitates the selection of representative hand regions from the RGB-D images. Furthermore, the number of parameters and computational overhead of the DAM are negligible in the network. The code is available at https://github.com/cweizen/cweizen-DDaNet_model_master .

机译：自动手指识别旨在克服聋人与可以听到的人之间的通信障碍。 RGB-D相机广泛用于处理手指遮挡，这通常会阻碍手指夹层识别。然而，颜色深度未对准，即RGB-D相机的内在属性，阻碍了在没有相机内在参数的情况下同时处理颜色和深度图像。此外，由于阶级的阶级变异和阶级相似性，由各种人进行并从多个视图捕获并从多种视图捕获的细粒手势使辨别特征提取难以困难。灵感来自人类视觉机制，我们提出了一种网络来学习与细粒度手势相关的歧视特征，同时抑制色力深度未对准的效果。与独立处理RGB-D图像的现有方法不同，提出了一种在单独的RGB和深度路径中学习FingerSpelling表示的双路径深度感知的关注网络，并逐渐解决了从这两个路径中学到的特征。当手通常是相机最近的物体时，深度信息可以有助于强调与字母标志相关的关键手指。因此，我们开发一个深度感知的注意模块（DAM），以利用深度特征映射中的空间关系，精炼RGB和横跨瓶颈结构映射。该模块建立RGB和深度路径的横向连接，并为两个路径提供深度感知的突出映射。实验结果表明，所提出的网络改善了准确性（+ 0.83％）和<内联公式XMLNS：MML =“http://www.w3.org/1998/math/mathml”xmlns：xlink =“http：/ / www.w3.org/1999/xlink“> $ f $ 得分（+ 1.55％）与状态相比 - 在公开的手指上的方法。网络过程的可视化表明该大坝有助于从RGB-D图像选择代表性的手区域。此外，在网络中，大坝的参数和计算开销的数量可以忽略不计。该代码可在 https： //github.com/cweizen/cweizen-ddanet_model_master 。

著录项

来源
《Quality Control, Transactions》 |2021年第1期|7306-7322|共17页
作者
Shih-Hung Yang; Wei-Ren Chen; Wun-Jhu Huang; Yon-Ping Chen;
展开▼
作者单位

National Cheng Kung University Tainan Taiwan;

National Chiao Tung University Hsinchu Taiwan;

National Cheng Kung University Tainan Taiwan;

National Chiao Tung University Hsinchu Taiwan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Feature extraction; Cameras; Visualization; Support vector machines; Shape; Radio frequency; Image recognition;

机译：特征提取;相机;可视化;支持向量机;形状;射频;图像识别;

相似文献

外文文献
中文文献
专利

1. Deep attention network for joint hand gesture localization and recognition using static RGB-D images [J] . Li Yuan, Wang Xinggang, Liu Wenyu, Information Sciences: An International Journal . 2018,第期

机译：深度关注网络用于联合手势识别和识别使用静态RGB-D图像
2. Dual-path attention network for single image super-resolution [J] . Huang Zhiyong, Li Wenbin, Li Jinxin, Expert systems with applications . 2021,第May期

机译：单幅图像超分辨率的双路注意网络
3. An expression recognition algorithm based on convolution neural network and RGB-D Images [J] . HE binghua, CHEN zengzhao, LI gaoyang, MATEC Web of Conferences . 2018,第2016期

机译：基于卷积神经网络和RGB-D图像的表情识别算法
4. Depth-aware convolutional neural networks for accurate 3D pose estimation in RGB-D images [C] . Lorenzo Porzi, Adrian Penate-Sanchez, Elisa Ricci, IEEE/RSJ International Conference on Intelligent Robots and Systems . 2017

机译：深度感知卷积神经网络可在RGB-D图像中进行准确的3D姿态估计
5. Attention and Depth Hallucination for RGB-D Face Recognition with Deep Learning [D] . Uppal, Hardik. 2021

机译：深度学习的RGB-D人脸识别的关注和深度幻觉
6. Reading Pictures Instead of Looking: RGB-D Image-Based Action Recognition via Capsule Network and Kalman Filter [O] . Botong Zhao, Yanjie Wang, Keke Su, 2021

机译：阅读图片而不是寻找：RGB-D基于图像的动作识别通过胶囊网络和卡尔曼滤波器
7. Remote Sensing Image Recognition Based on Multi-attention Residual Fusion Networks [O] . Weiwei Cai, Zhanguo Wei, Runmin Liu, 2021

机译：基于多关注剩余融合网络的遥感图像识别

DDaNet: Dual-Path Depth-Aware Attention Network for Fingerspelling Recognition Using RGB-D Images

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅