首页> 外文期刊>Quality Control, Transactions >DDaNet: Dual-Path Depth-Aware Attention Network for Fingerspelling Recognition Using RGB-D Images
【24h】

DDaNet: Dual-Path Depth-Aware Attention Network for Fingerspelling Recognition Using RGB-D Images

机译:DDANET:使用RGB-D图像的双路径深度感知注意网络,用于指尖识别

获取原文
获取原文并翻译 | 示例
           

摘要

Automatic fingerspelling recognition aims to overcome communication barriers between people who are deaf and those who can hear. RGB-D cameras are widely used to handle finger occlusion, which usually hinders fingerspelling recognition. However, color-depth misalignment, which is an intrinsic property of RGB-D cameras, hinders the simultaneous processing of color and depth images in the absence of intrinsic parameters of the camera. Furthermore, fine-grained hand gestures performed by various persons and captured from multiple views render the discriminative feature extraction difficult, due to intra-class variability and inter-class similarity. Inspired by the human visual mechanism, we propose a network to learn discriminative features related to fine-grained hand gestures while suppressing the effect of color–depth misalignment. Unlike existing approaches that independently process RGB-D images, a dual-path depth-aware attention network that learns a fingerspelling representation in separate RGB and depth paths, and progressively fuses the features learned from the two paths is proposed. As the hand is usually the closest object to the camera, depth information can contribute to emphasize the key fingers related to a letter sign. Thus, we develop a depth-aware attention module (DAM) to exploit spatial relations in the depth feature maps, refining the RGB and depth feature maps across a bottleneck structure. The module establishes a lateral connection of the RGB and depth paths and provides a depth-aware salient map to both paths. The experimental results demonstrated that the proposed network improved the accuracy (+0.83%) and $F$ score (+1.55%) compared to state-of-the-art methods on a publicly available fingerspelling dataset. The visualization of the network processes demonstrates that the DAM facilitates the selection of representative hand regions from the RGB-D images. Furthermore, the number of parameters and computational overhead of the DAM are negligible in the network. The code is available at https://github.com/cweizen/cweizen-DDaNet_model_master .
机译:自动手指识别旨在克服聋人与可以听到的人之间的通信障碍。 RGB-D相机广泛用于处理手指遮挡,这通常会阻碍手指夹层识别。然而,颜色深度未对准,即RGB-D相机的内在属性,阻碍了在没有相机内在参数的情况下同时处理颜色和深度图像。此外,由于阶级的阶级变异和阶级相似性,由各种人进行并从多个视图捕获并从多种视图捕获的细粒手势使辨别特征提取难以困难。灵感来自人类视觉机制,我们提出了一种网络来学习与细粒度手势相关的歧视特征,同时抑制色力深度未对准的效果。与独立处理RGB-D图像的现有方法不同,提出了一种在单独的RGB和深度路径中学习FingerSpelling表示的双路径深度感知的关注网络,并逐渐解决了从这两个路径中学到的特征。当手通常是相机最近的物体时,深度信息可以有助于强调与字母标志相关的关键手指。因此,我们开发一个深度感知的注意模块(DAM),以利用深度特征映射中的空间关系,精炼RGB和横跨瓶颈结构映射。该模块建立RGB和深度路径的横向连接,并为两个路径提供深度感知的突出映射。实验结果表明,所提出的网络改善了准确性(+ 0.83%)和<内联公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http:/ / www.w3.org/1999/xlink“> $ f $ 得分(+ 1.55%)与状态相比 - 在公开的手指上的方法。网络过程的可视化表明该大坝有助于从RGB-D图像选择代表性的手区域。此外,在网络中,大坝的参数和计算开销的数量可以忽略不计。该代码可在 https: //github.com/cweizen/cweizen-ddanet_model_master

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号