Improved Data Streams Classification with Fast Unsupervised Feature Selection

机译：改进的数据流分类，具有快速无监督的特征选择

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data streams classification poses three major challenges, namely, infinite length, concept-drift, and featureevolution. The first two issues have been widely studied. However, most existing data stream classification techniques ignore the last one. DXMiner [17], the first model which addresses featureevolution by using the past labeled instances to select the top ranked features based on a scores computed by a formula. This semi-supervised feature selection method depends on the quality of the past classification and neglects the possible correlation among different features, thus unable to produce an optimal feature subset which deteriorates the accuracy of classification. Multi-Cluster Feature Selection (MCFS) [5] proposed for static data classification and clustering applies unsupervised feature selection to address the feature-evolution problem, but suffers from the high computational cost in feature selection. In this paper, we apply MCFS in the DXMiner framework to handle each window of data in a data stream for dynamic data stream-classification. With unsupervised feature selection, our method produces the optimal feature subset and hence improves DXMiner on the classification accuracy. We further improve the time complexity of the feature selection process in MCFS by using the locality sensitive hashing forest (LSH Forest) [4]. The empirical results indicate that our approach outperforms stateof-the-art streams classification techniques in classifying real-life data streams.

机译：数据流分类带来了三个主要挑战，即无限长度，概念漂移和特征演化。前两个问题已得到广泛研究。但是，大多数现有的数据流分类技术都忽略了最后一种。 DXMiner [17]，这是第一个通过使用过去标记的实例根据公式计算出的分数来选择排名最高的要素来解决要素进化的模型。这种半监督的特征选择方法取决于过去分类的质量，而忽略了不同特征之间可能的相关性，因此无法生成最优的特征子集，从而降低了分类的准确性。为静态数据分类和聚类提出的多集群特征选择（MCFS）[5]应用无监督的特征选择来解决特征演化问题，但是遭受了特征选择中的高计算成本的困扰。在本文中，我们在DXMiner框架中应用MCFS来处理数据流中的每个数据窗口，以进行动态数据流分类。通过无监督的特征选择，我们的方法产生了最佳的特征子集，从而提高了DXMiner的分类精度。我们通过使用局部敏感哈希林（LSH Forest）[4]进一步提高了MCFS中特征选择过程的时间复杂度。实验结果表明，在对现实生活中的数据流进行分类时，我们的方法优于最新的流分类技术。

著录项

来源
《International Conference on Parallel and Distributed Computing, Applications and Technologies》|2016年|221-226|共6页
会议地点
作者
Lulu Wang; Hong Shen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data models; Time complexity; Distributed databases; Algorithm design and analysis; Support vector machines; Clustering algorithms; Classification algorithms;

机译：数据模型;时间复杂度;分布式数据库;算法设计与分析;支持向量机;聚类算法;分类算法;

相似文献

外文文献
中文文献
专利

1. A new approach for data stream classification: unsupervised feature representational online sequential extreme learning machine [J] . Ozge Aydogdu, Murat Ekinci Multimedia Tools and Applications . 2020,第37a38期

机译：数据流分类的新方法：无监督的特征代表在线顺序极限学习机
2. Unsupervised feature selection algorithm for multiclass cancer classification of gene expression RNA-Seq data [J] . Genomics . 2020,第2期

机译：基因表达的多标菌癌分类的无监督特征选择算法RNA-SEQ数据
3. Unsupervised feature selection algorithm for multiclass cancer classification of gene expression RNA-Seq data [J] . Pilar García-Díaz, Isabel Sánchez-Berriel, Juan A. Martínez-Rojas, Genomics . 2020,第2期

机译：关于基因表达RNA-SEQ数据的多标菌癌分类的无监督特征选择算法
4. Improved Data Streams Classification with Fast Unsupervised Feature Selection [C] . Lulu Wang, Hong Shen International Conference on Parallel and Distributed Computing, Applications and Technologies . 2016

机译：使用快速无监督功能选择改进数据流分类
5. Unsupervised data mining methods for functional data analysis and feature selection. [D] . Rattakorn, Panaya. 2009

机译：用于功能数据分析和特征选择的无监督数据挖掘方法。
6. Peculiar Genes Selection: A new features selection method to improve classification performances in imbalanced data sets [O] . Federica Martina, Marco Beccuti, Gianfranco Balbo, -1

机译：特殊基因选择：一种新的特征选择方法可改善不平衡数据集中的分类性能
7. Unsupervised Feature Selection Based on Ultrametricity and Sparse Training Data: A Case Study for the Classification of High-Dimensional Hyperspectral Data [O] . Patrick Bradley, Sina Keller, Martin Weinmann 2018

机译：基于Ultrametricity和稀疏训练数据的无监督特征选择：高维超光谱数据分类的案例研究
8. Improved Feature Extraction, Feature Selection, and Identification Techniques That Create a Fast Unsupervised Hyperspectral Target Detection Algorithm [R] . Johnson, R. J. 2008

机译：改进的特征提取，特征选择和识别技术，创建快速无监督的高光谱目标检测算法

Improved Data Streams Classification with Fast Unsupervised Feature Selection

摘要

著录项

相似文献

相关主题

期刊订阅