首页> 外文会议>International Conference on Parallel and Distributed Computing, Applications and Technologies >Improved Data Streams Classification with Fast Unsupervised Feature Selection
【24h】

Improved Data Streams Classification with Fast Unsupervised Feature Selection

机译:改进的数据流分类,具有快速无监督的特征选择

获取原文

摘要

Data streams classification poses three major challenges, namely, infinite length, concept-drift, and featureevolution. The first two issues have been widely studied. However, most existing data stream classification techniques ignore the last one. DXMiner [17], the first model which addresses featureevolution by using the past labeled instances to select the top ranked features based on a scores computed by a formula. This semi-supervised feature selection method depends on the quality of the past classification and neglects the possible correlation among different features, thus unable to produce an optimal feature subset which deteriorates the accuracy of classification. Multi-Cluster Feature Selection (MCFS) [5] proposed for static data classification and clustering applies unsupervised feature selection to address the feature-evolution problem, but suffers from the high computational cost in feature selection. In this paper, we apply MCFS in the DXMiner framework to handle each window of data in a data stream for dynamic data stream-classification. With unsupervised feature selection, our method produces the optimal feature subset and hence improves DXMiner on the classification accuracy. We further improve the time complexity of the feature selection process in MCFS by using the locality sensitive hashing forest (LSH Forest) [4]. The empirical results indicate that our approach outperforms stateof-the-art streams classification techniques in classifying real-life data streams.
机译:数据流分类带来了三个主要挑战,即无限长度,概念漂移和特征演化。前两个问题已得到广泛研究。但是,大多数现有的数据流分类技术都忽略了最后一种。 DXMiner [17],这是第一个通过使用过去标记的实例根据公式计算出的分数来选择排名最高的要素来解决要素进化的模型。这种半监督的特征选择方法取决于过去分类的质量,而忽略了不同特征之间可能的相关性,因此无法生成最优的特征子集,从而降低了分类的准确性。为静态数据分类和聚类提出的多集群特征选择(MCFS)[5]应用无监督的特征选择来解决特征演化问题,但是遭受了特征选择中的高计算成本的困扰。在本文中,我们在DXMiner框架中应用MCFS来处理数据流中的每个数据窗口,以进行动态数据流分类。通过无监督的特征选择,我们的方法产生了最佳的特征子集,从而提高了DXMiner的分类精度。我们通过使用局部敏感哈希林(LSH Forest)[4]进一步提高了MCFS中特征选择过程的时间复杂度。实验结果表明,在对现实生活中的数据流进行分类时,我们的方法优于最新的流分类技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号