首页> 外文OA文献 >On-line fast kernel based methods for classification over stream data (with case studies for cyber-security)

【2h】

On-line fast kernel based methods for classification over stream data (with case studies for cyber-security)

机译：基于在线快速核的流数据分类方法（用于网络安全的案例研究）

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This thesis proposes and presents several novel methods to address some of the real world stream data modelling issues through the use of global and local modelling approaches. A set of real world stream data modelling issues such as dealing with large size and, high dimensionality data, skewed class distribution, different formats of data and visualisation problem are reviewed and their impact on various models are analysed.The thesis has made nine major contributions to information science, that include four evolving modelling methods, three real world application systems that apply these methods and two stream data visualisation software prototypes. Four novel methods have been developed and published in the course of this study. They are: (1) Online Core Vector Machines (OCVM); (2) Hierarchical CVMs (HCVM) - a local modelling system based on hierarchical labelling data; (3) Dynamic Evolving CVMs (DE-CVM) - a kernel based dynamic evolving learning system; (4) Meta-Learning String Kernel CVM.OCVM addresses the issue of one-pass, large size, high dimensionality stream data through a kernel-based online learning process. OCVM is proposed for large-scale classification by leveraging connections between learning and computational geometry. It imposes the constraint that only a single pass over the data is allowed. Standard support vector machines (SVM) training has O(m3) time and O(m2) space complexities, where m is the training set size. It is thus computationally infeasible on very large data sets. Our proposed OCVM inherits the advantage of the Core Vector Machine (CVM) algorithm which can be used with non-linear kernels and has a time complexity that is linear in m and a space complexity that is independent of m.HCVM solves the skewed-class distribution problem for hierarchical stream data by identifying them through the sub-classes clustering process, creating child CVMs based on the hierarchical labels and applies supervised learning to update the core vectors. This puts strong emphasis on the unique problem subspaces and allows easy to discriminate parent classes by local modelling on their child classes.DE-CVM takes HCVM a step further by implementing an evolving clustering process. DE-CVM evolves through incremental, hybrid learning and accommodates new input stream data, including new features, new classes, etc. through local element tuning. New core vectors are created and updated while the system is operating. In contrast to HCVM, DE-CVM can work not only on hierarchical data but also on any numerical stream data.Meta Learning String Kernel CVM is proposed to satisfy the string format stream data learning. Recently, string kernel based support vector machines have shown competitive performance in tasks such as text classification and protein homology detection. Meta Learning String Kernel CVM improves the effectiveness of traditional string kernels SVMs by learning the meta knowledge and adopting CVMs.The novel stream learning methods outlined above have been applied to the following three real world data modelling problems:1. Hierarchical network data intrusion detection;2. Face Membership Authentication;3. String data (i.e. Spam email, news and malicious software) classification.These solutions constitute the main contribution of this research to the area of applied information science. In addition to the above contributions, two stream data visualisation systems were developed: the network intrusion detection visualisation system (NIDVS) and the HCVM prototype system. These systems overcome the difficulty of monitoring stream data learning progress and also provide a better understanding of local modelling.In summary, real world problems consist of many smaller problems. It was found beneficial to acknowledge the existence of these sub-problems and address them through the use of local models. The core vectors extracted from the local models also brought about the availability of new knowledge for researchers and would allow more in-depth study of the sub-problems to be carried out in future research.

机译：本文提出并提出了几种新颖的方法，通过使用全局和局部建模方法来解决一些现实世界中的流数据建模问题。回顾了一组现实世界中的流数据建模问题，例如处理大尺寸和高维数据，偏斜的类分布，不同格式的数据和可视化问题，并分析了它们对各种模型的影响。本文做了九个主要贡献信息科学领域，其中包括四种不断发展的建模方法，三种应用这些方法的现实世界应用系统以及两个流数据可视化软件原型。在这项研究过程中，已经开发出了四种新颖的方法并发表了这些方法。它们是：（1）在线核心向量机（OCVM）；（2）分层CVM（HCVM）-基于分层标签数据的本地建模系统；（3）动态演化CVM（DE-CVM）-基于内核的动态演化学习系统；（4）元学习字符串内核CVM.OCVM通过基于内核的在线学习过程解决了一次通过，大尺寸，高维流数据的问题。通过利用学习和计算几何之间的联系，提出了用于大规模分类的OCVM。它强加了一个约束，即只允许对数据进行一次传递。标准支持向量机（SVM）训练具有O（m3）时间和O（m2）空间复杂度，其中m是训练集大小。因此，在非常大的数据集上在计算上是不可行的。我们提出的OCVM继承了Core Vector Machine（CVM）算法的优势，该算法可用于非线性内核，并且时间复杂度为m线性，空间复杂度与m不相关.HCVM解决了偏斜类通过子类聚类过程识别分层流数据的分布问题，基于分层标签创建子CVM，并应用监督学习来更新核心向量。这极大地强调了独特的问题子空间，并允许通过对子类进行局部建模来轻松区分父类。DE-CVM通过实施不断发展的聚类过程，将HCVM向前迈进了一步。 DE-CVM通过增量式混合学习而发展，并通过本地元素调整来容纳新的输入流数据，包括新功能，新类等。系统运行时会创建并更新新的核心向量。与HCVM相比，DE-CVM不仅可以处理分层数据，而且还可以处理任何数字流数据。元学习字符串内核CVM可以满足字符串格式流数据学习的需求。近来，基于字符串核的支持向量机在诸如文本分类和蛋白质同源性检测等任务中显示出竞争优势。元学习字符串内核CVM通过学习元知识和采用CVM来提高传统字符串内核SVM的有效性。上面概述的新颖流学习方法已应用于以下三个现实世界的数据建模问题：1。分层网络数据入侵检测； 2。面部会员身份认证; 3。字符串数据（即垃圾邮件，新闻和恶意软件）分类这些解决方案构成了本研究对应用信息科学领域的主要贡献。除上述贡献外，还开发了两个流数据可视化系统：网络入侵检测可视化系统（NIDVS）和HCVM原型系统。这些系统克服了监视流数据学习进度的困难，并且还提供了对本地建模的更好理解。总之，现实世界中的问题由许多较小的问题组成。人们发现承认这些子问题的存在并通过使用局部模型来解决这些问题是有益的。从本地模型中提取的核心向量还为研究人员带来了新的知识，并将允许在以后的研究中对子问题进行更深入的研究。

著录项

作者
Chen Ye (Gary);
展开▼
作者单位

展开▼
年度 2012
总页数
原文格式 PDF
正文语种 en
中图分类

相似文献

外文文献
中文文献
专利

1. Using GNUsmail to Compare Data Stream Mining Methods for On-line Email Classification [J] . Albert Bifet, Joao Gama, Jose del Campo-Avila, JMLR: Workshop and Conference Proceedings . 2011,第2011期

机译：使用GNUsmail比较在线电子邮件分类的数据流挖掘方法
2. On the Parzen Kernel-Based Probability Density Function Learning Procedures Over Time-Varying Streaming Data With Applications to Pattern Classification [J] . Duda Piotr, Rutkowski Leszek, Jaworski Maciej, Cybernetics, IEEE Transactions on . 2020,第4期

机译：在基于Parzen内核的概率密度函数学习过程中，随着时间改变的流数据，具有模式分类
3. Sparse Self-Represented Network Map: A fast representative-based clustering method for large dataset and data stream [J] . Zhen Liu, Qiuhua Zheng, Zhongping Ji, Engineering Applications of Artificial Intelligence . 2018,第FEBa期

机译：稀疏的自代表网络地图：大型数据集和数据流的基于代表的快速聚类方法
4. Online Classification Algorithm for Data Streams Based on Fast Iterative Kernel Principal Component Analysis [C] . Feng Wu, Yan Zhong, Ai-ping Li, International Conference on Natural Computation;ICNC '09 . 2009

机译：基于快速迭代主成分分析的数据流在线分类算法
5. Kernel-based empirical Bayesian classification methods with applications to protein phosphorylation and non-coding RNA. [D] . Menor, Mark S. 2014

机译：基于核的经验贝叶斯分类方法，应用于蛋白质磷酸化和非编码RNA。
6. A Pathway-Based Kernel Boosting Method for Sample Classification Using Genomic Data [O] . Li Zeng, Zhaolong Yu, Hongyu Zhao 2019

机译：基于通路的基于核的基因组数据分类方法
7. On-line classification of data streams with missing values based on reinforcement learning [O] . Millán Giraldo Mónica, Traver Roig Vicente Javier, Sánchez Garreta José Salvador 2011

机译：基于强化学习的具有缺失值的数据流的在线分类

On-line fast kernel based methods for classification over stream data (with case studies for cyber-security)

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅