首页> 外文OA文献 >Incremental Sparse-PCA Feature Extraction For Data Streams
【2h】

Incremental Sparse-PCA Feature Extraction For Data Streams

机译:数据流的增量式稀疏PCA特征提取

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Intruders attempt to penetrate commercial systems daily and cause considerable financial losses for individuals and organizations. Intrusion detection systems monitor network events to detect computer security threats. An extensive amount of network data is devoted to detecting malicious activities.Storing, processing, and analyzing the massive volume of data is costly and indicate the need to find efficient methods to perform network data reduction that does not require the data to be first captured and stored. A better approach allows the extraction of useful variables from data streams in real time and in a single pass. The removal of irrelevant attributes reduces the data to be fed to the intrusion detection system (IDS) and shortens the analysis time while improving the classification accuracy. This dissertation introduces an online, real time, data processing method for knowledge extraction.This incremental feature extraction is based on two approaches. First, Chunk Incremental Principal Component Analysis (CIPCA) detects intrusion in data streams. Then, two novel incremental feature extraction methods, Incremental Structured Sparse PCA (ISSPCA) and Incremental Generalized Power Method Sparse PCA (IGSPCA), find malicious elements. Metrics helped compare the performance of all methods.The IGSPCA was found to perform as well as or better than CIPCA overall in term of dimensionality reduction, classification accuracy, and learning time. ISSPCA yielded better results for higher chunk values and greater accumulation ratio thresholds. CIPCA and IGSPCA reduced the IDS dataset to 10 principal components as opposed to 14 eigenvectors for ISSPCA. ISSPCA is more expensive in terms of learning time in comparison to the other techniques.This dissertation presents new methods that perform feature extraction from continuous data streams to find the small number of features necessary to express the most data variance. Data subsets derived from a few important variables render their interpretation easier.Another goal of this dissertation was to propose incremental sparse PCA algorithms capable to process data with concept drift and concept shift. Experiments using WaveForm and WaveFormNoise datasets confirmed this ability. Similar to CIPCA, the ISSPCA and IGSPCA updated eigen-axes as a function of the accumulation ratio value, forming informative eigenspace with few eigenvectors.
机译:入侵者试图每天侵入商业系统,并给个人和组织造成可观的经济损失。入侵检测系统监视网络事件以检测计算机安全威胁。大量的网络数据专用于检测恶意活动。存储,处理和分析大量数据的成本很高,这表明需要找到有效的方法来执行网络数据缩减,而这种方法不需要先捕获和捕获数据。存储。一种更好的方法允许实时,单次从数据流中提取有用的变量。无关属性的删除减少了要馈送到入侵检测系统(IDS)的数据,并缩短了分析时间,同时提高了分类准确性。本文介绍了一种在线,实时,数据处理的知识提取方法。这种增量特征提取基于两种方法。首先,块增量主成分分析(CIPCA)检测数据流中的入侵。然后,两种新颖的增量特征提取方法,即增量结构稀疏PCA(ISSPCA)和增量广义幂方法稀疏PCA(IGSPCA),发现了恶意元素。度量标准有助于比较所有方法的性能。在减少维度,分类准确性和学习时间方面,IGSPCA的整体性能优于或优于CIPCA。对于更高的组块值和更高的累积比率阈值,ISSPCA产生了更好的结果。 CIPCA和IGSPCA将IDS数据集减少到10个主要成分,而不是ISSPCA的14个特征向量。与其他技术相比,ISSPCA在学习时间上更为昂贵。本文提出了从连续数据流中进行特征提取以发现表示最大数据差异所需的少量特征的新方法。由几个重要变量衍生的数据子集使它们的解释更加容易。本论文的另一个目标是提出一种增量式稀疏PCA算法,该算法能够处理带有概念漂移和概念偏移的数据。使用WaveForm和WaveFormNoise数据集的实验证实了这种能力。与CIPCA相似,ISSPCA和IGSPCA更新了特征轴作为累积比率值的函数,从而形成了信息量少的特征向量的特征空间。

著录项

  • 作者

    Nziga Jean-Pierre;

  • 作者单位
  • 年度 2015
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号