首页> 外文学位 >Incremental Sparse-PCA Feature Extraction For Data Streams.
【24h】

Incremental Sparse-PCA Feature Extraction For Data Streams.

机译:数据流的增量式稀疏PCA特征提取。

获取原文
获取原文并翻译 | 示例

摘要

Intruders attempt to penetrate commercial systems daily and cause considerable financial losses for individuals and organizations. Intrusion detection systems monitor network events to detect computer security threats. An extensive amount of network data is devoted to detecting malicious activities.;Storing, processing, and analyzing the massive volume of data is costly and indicate the need to find efficient methods to perform network data reduction that does not require the data to be first captured and stored. A better approach allows the extraction of useful variables from data streams in real time and in a single pass. The removal of irrelevant attributes reduces the data to be fed to the intrusion detection system (IDS) and shortens the analysis time while improving the classification accuracy. This dissertation introduces an online, real time, data processing method for knowledge extraction.;This incremental feature extraction is based on two approaches. First, Chunk Incremental Principal Component Analysis (CIPCA) detects intrusion in data streams. Then, two novel incremental feature extraction methods, Incremental Structured Sparse PCA (ISSPCA) and Incremental Generalized Power Method Sparse PCA (IGSPCA), find malicious elements. Metrics helped compare the performance of all methods.;The IGSPCA was found to perform as well as or better than CIPCA overall in term of dimensionality reduction, classification accuracy, and learning time. ISSPCA yielded better results for higher chunk values and greater accumulation ratio thresholds. CIPCA and IGSPCA reduced the IDS dataset to 10 principal components as opposed to 14 eigenvectors for ISSPCA. ISSPCA is more expensive in terms of learning time in comparison to the other techniques.;This dissertation presents new methods that perform feature extraction from continuous data streams to find the small number of features necessary to express the most data variance. Data subsets derived from a few important variables render their interpretation easier.;Another goal of this dissertation was to propose incremental sparse PCA algorithms capable to process data with concept drift and concept shift. Experiments using WaveForm and WaveFormNoise datasets confirmed this ability. Similar to CIPCA, the ISSPCA and IGSPCA updated eigen-axes as a function of the accumulation ratio value, forming informative eigenspace with few eigenvectors.
机译:入侵者试图每天侵入商业系统,并给个人和组织造成可观的经济损失。入侵检测系统监视网络事件以检测计算机安全威胁。大量的网络数据专用于检测恶意活动。;存储,处理和分析海量数据非常昂贵,并且表明需要找到有效的方法来执行网络数据缩减,而这种方法不需要首先捕获数据并存储。一种更好的方法允许实时,单次从数据流中提取有用的变量。无关属性的删除减少了要馈送到入侵检测系统(IDS)的数据,并缩短了分析时间,同时提高了分类准确性。本文介绍了一种在线,实时,数据处理的知识提取方法。该增量特征提取基于两种方法。首先,块增量主成分分析(CIPCA)检测数据流中的入侵。然后,两种新颖的增量特征提取方法,即增量结构稀疏PCA(ISSPCA)和增量广义幂方法稀疏PCA(IGSPCA),发现了恶意元素。度量标准有助于比较所有方法的性能。在减少维度,分类准确性和学习时间方面,发现IGSPCA的总体性能优于或优于CIPCA。对于更高的组块值和更高的累积比率阈值,ISSPCA产生了更好的结果。 CIPCA和IGSPCA将IDS数据集减少到10个主要成分,而不是ISSPCA的14个特征向量。与其他技术相比,ISSPCA在学习时间上更昂贵。本论文提出了从连续数据流中进行特征提取以发现表示最大数据差异所需的少量特征的新方法。由几个重要变量衍生的数据子集使它们的解释更容易。本论文的另一个目标是提出能够处理概念漂移和概念移位的数据的增量式稀疏PCA算法。使用WaveForm和WaveFormNoise数据集的实验证实了这种能力。与CIPCA相似,ISSPCA和IGSPCA更新了特征轴作为累积比率值的函数,从而形成了信息量少的特征向量的特征空间。

著录项

  • 作者

    Nziga, Jean-Pierre.;

  • 作者单位

    Nova Southeastern University.;

  • 授予单位 Nova Southeastern University.;
  • 学科 Computer science.;Information science.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 127 p.
  • 总页数 127
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号