Robust multimodal collaborative visual recognition with missing data

机译：缺少数据的强大的多模式协作视觉识别

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data contamination is one of typical difficulties that many computer vision practitioners encounter in real-world applications. With the increasing popularity of multi-view learning, the problem of imperfect data is even more pronounced. Specifically, there could be random missing features or even missing entire sensing channels in the testing phase, possibly due to interferences or bandwidth limits. In addition, cross-view paired correspondences could also be missing in the training data.;In this thesis, a series of missing data robust multi-view visual recognition methods are proposed to address these challenges. For the systematic and random missing of features in the testing data, a latent space based multi-view learning framework is developed. Paired with two types of information preserving projections and manifold embeddings algorithms, this framework effectively addresses the aforementioned data degradations and achieves superior recognition performances.;Inspired by the Regularized Generalized Canonical Correlation Analysis (RGCCA) and label information encoding, the Discriminative Canonical Correlation Analysis (DCCA) is proposed as the first type of supervised embedding algorithm. Alternatively, inspired by the recent success of metric learning and domain transfer learning, the Similarity Learning Canonical Correlation Analysis (SLCCA) is proposed to optimize the latent space with explicit category-preserving optimization constraints.;In addition, two variants of the aforementioned missing data problem are considered. In addition to the missing features in the testing phase, there could be missing correspondences among the training data. A new algorithm is proposed, which combines the effective subspace alignment technique and supervised information preserving embedding based on the squared-loss mutual information criterion. Alternatively, an asymmetric multimodal Convolutional Neural Network based approach is also proposed to jointly reconstruct the feature residuals and carry out classification.;With these recognition frameworks and new algorithms, multimodal visual classification is carried out on multiple benchmark datasets. We prove that these methods are robust against various data imperfections and outperform common baselines.

机译：数据污染是许多计算机视觉从业人员在实际应用中遇到的典型困难之一。随着多视图学习的日益普及，不完美数据的问题更加突出。具体而言，在测试阶段，可能由于干扰或带宽限制，可能会随机丢失某些功能，甚至可能丢失整个感测通道。此外，训练数据中也可能缺少跨视图配对对应关系。本文提出了一系列缺失数据鲁棒的多视图视觉识别方法来应对这些挑战。为了系统地和随机地丢失测试数据中的特征，开发了一种基于潜在空间的多视图学习框架。结合两种类型的信息保存投影和流形嵌入算法，该框架有效地解决了上述数据降级问题并实现了卓越的识别性能。提出将DCCA作为第一种监督嵌入算法。另外，受度量学习和域转移学习的最新成功启发，提出了相似性学习规范相关分析（SLCCA），以使用显式保留类别的优化约束来优化潜在空间。此外，上述缺失数据的两个变体问题被考虑。除了测试阶段缺少的功能外，训练数据之间可能还缺少对应关系。提出了一种新算法，该算法结合了有效的子空间对齐技术和基于平方损失互信息准则的监督信息保存嵌入。另外，还提出了一种基于非对称多模态卷积神经网络的方法来联合重构特征残差并进行分类。通过这些识别框架和新算法，对多个基准数据集进行了多模态视觉分类。我们证明了这些方法对各种数据缺陷均具有鲁棒性，并且性能优于通用基准。

著录项

作者
Zhang, Qilin.;
展开▼
作者单位

Stevens Institute of Technology.;

展开▼
授予单位 Stevens Institute of Technology.;
学科 Computer science.;Artificial intelligence.
学位 Ph.D.
年度 2016
页码 130 p.
总页数 130
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Exploring Fusion Methods for Multimodal Emotion Recognition with Missing Data [J] . Wagner Johannes, Andre Elisabeth, Lingenfelser Florian, Affective Computing, IEEE Transactions on . 2011,第4期

机译：探索融合多模式情感识别数据的方法
2. Noise robust speech recognition system using multimodal audio-visual approach using different deep learning classification techniques [J] . Eslam E. El Maghraby, Amr M. Gody, Mohamed Hesham Farouk International Journal of Advanced Computer Research . 2020,第47期

机译：利用不同深度学习分类技术，使用多模式视听方法的噪声强大语音识别系统
3. ROBUST MULTIMODAL PERSON RECOGNITION USING LOW-COMPLEXITY AUDIO-VISUAL FEATURE FUSION APPROACHES [J] . DHAVAL SHAH, KYU J. HAN, SHRIKANTH S. NARAYANAN International journal of semantic computing . 2010,第2期

机译：基于低复杂度视听特征融合方法的鲁棒多模态人员识别
4. Multimodal learning using 3D audio-visual data for audio-visual speech recognition [C] . Rongfeng Su, Lan Wang, Xunying Liu International conference on Asian language processing . 2017

机译：使用3D视听数据进行视听语音识别的多模式学习
5. Robust Pose Estimation of a Robotic Navigation Aid for the Visually Impaired by Multimodal Data Fusion [D] . Zhang, He. 2018

机译：针对多模态数据融合的视觉障碍者的机器人导航辅助系统的鲁棒姿态估计
6. Analysis of different affective state multimodal recognition approaches with missing data-oriented to virtual learning environments [O] . Camilo Salazar, Edwin Montoya-Múnera, Jose Aguilar 2021

机译：缺少数据导向到虚拟学习环境的不同情感多模式识别方法分析
7. Noise-Robust Speech Recognition System based on Multimodal Audio-Visual Approach Using Different Deep Learning Classification Techniques [O] . Eslam ElMaghraby, Amr Gody, Mohamed Farouk 2020

机译：基于不同深度学习分类技术的多模式视听方法的噪声鲁棒语音识别系统
8. Joint Sparse Representation for Robust Multimodal Biometrics Recognition. [R] . Patel, V. M., Nasrabadi, N. M., Chellappa, R., 2014

机译：鲁棒多模态生物特征识别的联合稀疏表示。

Robust multimodal collaborative visual recognition with missing data

摘要

著录项

相似文献

相关主题

期刊订阅