首页> 外文会议>International Conference on Intelligent Control and Information Processing >A Gaussian Data Augmentation Technique on Highly Dimensional, Limited Labeled Data for Multiclass Classification Using Deep Learning
【24h】

A Gaussian Data Augmentation Technique on Highly Dimensional, Limited Labeled Data for Multiclass Classification Using Deep Learning

机译:使用深度学习对高维,受限标签数据进行多类分类的高斯数据增强技术

获取原文

摘要

In recent years, using oceans of data and virtually infinite cloud-based computation power, deep learning models leverage the current state-of-the-art classification to reach expert level performance. Researchers continue to explore applications of deep machine learning models ranging from face-, text- and voice-recognition to signal and information processing. With the continuously increasing data collection capabilities, datasets are becoming larger and more dimensional. However, manually labeled data points cannot keep up. It is this disparity between the high number of features and the low number of labeled samples what motivates a new approach to integrate feature reduction and sample augmentation to deep learning classifiers. This paper explores the performance of such approach on three deep learning classifiers: MLP, CNN, and LSTM. First, we establish a baseline using the original dataset. Second, we preprocess the dataset using principal component analysis (PCA). Third, we preprocess the dataset with PCA followed by our Gaussian data augmentation (GDA) technique. To estimate performance, we add k-fold cross-validation to our experiments and compile our results in a numerical and graphical using the confusion matrix and a classification report that includes accuracy, recall, f-score and support. Our experiments suggest superior classification accuracy of all three classifiers in the presence of our PCA+GDA approach.
机译:近年来,深度学习模型利用数据的海洋和几乎无限的基于云的计算能力,利用当前的最新分类来达到专家级的性能。研究人员继续探索深度机器学习模型的应用,范围从面部,文本和语音识别到信号和信息处理。随着数据收集能力的不断提高,数据集正在变得越来越大,维度越来越大。但是,手动标记的数据点无法跟上。正是由于大量特征和少量标记样本之间的这种差异,才促使人们采用一种新的方法来将特征缩减和样本扩充集成到深度学习分类器中。本文探讨了这种方法在三个深度学习分类器上的性能:MLP,CNN和LSTM。首先,我们使用原始数据集建立基线。其次,我们使用主成分分析(PCA)对数据集进行预处理。第三,我们先使用PCA预处理数据集,然后再使用高斯数据扩充(GDA)技术。为了评估性能,我们在实验中添加了k倍交叉验证,并使用混淆矩阵和包括准确性,召回率,f评分和支持度的分类报告以数字和图形的形式汇总了我们的结果。我们的实验表明,在使用PCA + GDA方法的情况下,所有三个分类器的分类准确度都很高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号