首页> 外文OA文献 >On-line fast kernel based methods for classification over stream data (with case studies for cyber-security)
【2h】

On-line fast kernel based methods for classification over stream data (with case studies for cyber-security)

机译:基于在线快速核的流数据分类方法(用于网络安全的案例研究)

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This thesis proposes and presents several novel methods to address some of the real world stream data modelling issues through the use of global and local modelling approaches. A set of real world stream data modelling issues such as dealing with large size and, high dimensionality data, skewed class distribution, different formats of data and visualisation problem are reviewed and their impact on various models are analysed.The thesis has made nine major contributions to information science, that include four evolving modelling methods, three real world application systems that apply these methods and two stream data visualisation software prototypes. Four novel methods have been developed and published in the course of this study. They are: (1) Online Core Vector Machines (OCVM); (2) Hierarchical CVMs (HCVM) - a local modelling system based on hierarchical labelling data; (3) Dynamic Evolving CVMs (DE-CVM) - a kernel based dynamic evolving learning system; (4) Meta-Learning String Kernel CVM.OCVM addresses the issue of one-pass, large size, high dimensionality stream data through a kernel-based online learning process. OCVM is proposed for large-scale classification by leveraging connections between learning and computational geometry. It imposes the constraint that only a single pass over the data is allowed. Standard support vector machines (SVM) training has O(m3) time and O(m2) space complexities, where m is the training set size. It is thus computationally infeasible on very large data sets. Our proposed OCVM inherits the advantage of the Core Vector Machine (CVM) algorithm which can be used with non-linear kernels and has a time complexity that is linear in m and a space complexity that is independent of m.HCVM solves the skewed-class distribution problem for hierarchical stream data by identifying them through the sub-classes clustering process, creating child CVMs based on the hierarchical labels and applies supervised learning to update the core vectors. This puts strong emphasis on the unique problem subspaces and allows easy to discriminate parent classes by local modelling on their child classes.DE-CVM takes HCVM a step further by implementing an evolving clustering process. DE-CVM evolves through incremental, hybrid learning and accommodates new input stream data, including new features, new classes, etc. through local element tuning. New core vectors are created and updated while the system is operating. In contrast to HCVM, DE-CVM can work not only on hierarchical data but also on any numerical stream data.Meta Learning String Kernel CVM is proposed to satisfy the string format stream data learning. Recently, string kernel based support vector machines have shown competitive performance in tasks such as text classification and protein homology detection. Meta Learning String Kernel CVM improves the effectiveness of traditional string kernels SVMs by learning the meta knowledge and adopting CVMs.The novel stream learning methods outlined above have been applied to the following three real world data modelling problems:1. Hierarchical network data intrusion detection;2. Face Membership Authentication;3. String data (i.e. Spam email, news and malicious software) classification.These solutions constitute the main contribution of this research to the area of applied information science. In addition to the above contributions, two stream data visualisation systems were developed: the network intrusion detection visualisation system (NIDVS) and the HCVM prototype system. These systems overcome the difficulty of monitoring stream data learning progress and also provide a better understanding of local modelling.In summary, real world problems consist of many smaller problems. It was found beneficial to acknowledge the existence of these sub-problems and address them through the use of local models. The core vectors extracted from the local models also brought about the availability of new knowledge for researchers and would allow more in-depth study of the sub-problems to be carried out in future research.
机译:本文提出并提出了几种新颖的方法,通过使用全局和局部建模方法来解决一些现实世界中的流数据建模问题。回顾了一组现实世界中的流数据建模问题,例如处理大尺寸和高维数据,偏斜的类分布,不同格式的数据和可视化问题,并分析了它们对各种模型的影响。本文做了九个主要贡献信息科学领域,其中包括四种不断发展的建模方法,三种应用这些方法的现实世界应用系统以及两个流数据可视化软件原型。在这项研究过程中,已经开发出了四种新颖的方法并发表了这些方法。它们是:(1)在线核心向量机(OCVM); (2)分层CVM(HCVM)-基于分层标签数据的本地建模系统; (3)动态演化CVM(DE-CVM)-基于内核的动态演化学习系统; (4)元学习字符串内核CVM.OCVM通过基于内核的在线学习过程解决了一次通过,大尺寸,高维流数据的问题。通过利用学习和计算几何之间的联系,提出了用于大规模分类的OCVM。它强加了一个约束,即只允许对数据进行一次传递。标准支持向量机(SVM)训练具有O(m3)时间和O(m2)空间复杂度,其中m是训练集大小。因此,在非常大的数据集上在计算上是不可行的。我们提出的OCVM继承了Core Vector Machine(CVM)算法的优势,该算法可用于非线性内核,并且时间复杂度为m线性,空间复杂度与m不相关.HCVM解决了偏斜类通过子类聚类过程识别分层流数据的分布问题,基于分层标签创建子CVM,并应用监督学习来更新核心向量。这极大地强调了独特的问题子空间,并允许通过对子类进行局部建模来轻松区分父类。DE-CVM通过实施不断发展的聚类过程,将HCVM向前迈进了一步。 DE-CVM通过增量式混合学习而发展,并通过本地元素调整来容纳新的输入流数据,包括新功能,新类等。系统运行时会创建并更新新的核心向量。与HCVM相比,DE-CVM不仅可以处理分层数据,而且还可以处理任何数字流数据。元学习字符串内核CVM可以满足字符串格式流数据学习的需求。近来,基于字符串核的支持向量机在诸如文本分类和蛋白质同源性检测等任务中显示出竞争优势。元学习字符串内核CVM通过学习元知识和采用CVM来提高传统字符串内核SVM的有效性。上面概述的新颖流学习方法已应用于以下三个现实世界的数据建模问题:1。分层网络数据入侵检测; 2。面部会员身份认证; 3。字符串数据(即垃圾邮件,新闻和恶意软件)分类这些解决方案构成了本研究对应用信息科学领域的主要贡献。除上述贡献外,还开发了两个流数据可视化系统:网络入侵检测可视化系统(NIDVS)和HCVM原型系统。这些系统克服了监视流数据学习进度的困难,并且还提供了对本地建模的更好理解。总之,现实世界中的问题由许多较小的问题组成。人们发现承认这些子问题的存在并通过使用局部模型来解决这些问题是有益的。从本地模型中提取的核心向量还为研究人员带来了新的知识,并将允许在以后的研究中对子问题进行更深入的研究。

著录项

  • 作者

    Chen Ye (Gary);

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号