首页> 外文期刊>Concurrency and computation: practice and experience >Semi-supervised incremental feature extraction algorithm forrnlarge-scale data stream
【24h】

Semi-supervised incremental feature extraction algorithm forrnlarge-scale data stream

机译:大型数据流的半监督增量特征提取算法

获取原文
获取原文并翻译 | 示例

摘要

In big data era, how to process large-scale data stream is one of the existing challenges. Feature extractionrnmethod has attracted much attention because of its effectiveness to data classification. Traditional classificationrnalgorithms may take less advantage of labeled samples information. Online learning and out-of-samplernproblems are also hot topics recently. To solve these problems, a novel algorithm named semi-supervisedrnincremental feature extraction algorithm is proposed in this paper. First, we extract feature incrementallyrnin unsupervised way. Then we propose a semi-supervised subspace learning algorithm by taking advantagernof class information to adjust k-nearest neighbor weights. Third, we combine the unsupervised andrnsemi-supervised feature extraction approaches to obtain objective function, in order to solve the out-ofsamplernlearning problem. Experiments have been carried out on Machine learning datasets of Universityrnof California Irvine (UCI) datasets and real-world face image datasets (Olivetti faces (ORL), Yale, YaleB,rnand Rendered face). To demonstrate the proposed algorithm’s expandability to process the large-scale datarnstream, classification experiments using Spark skill in parallel computation environment are performed,rnwith comparisons with some related semi-supervised feature extraction methods. The experiment results andrncomputational complex comparison demonstrate that the proposed algorithm can obtain good performance.rnCopyright © 2016 John Wiley & Sons, Ltd.
机译:在大数据时代,如何处理大规模数据流是现有的挑战之一。特征提取方法由于其对数据分类的有效性而备受关注。传统的分类算法可能不会充分利用带标签的样本信息。在线学习和样本外问题也是最近的热门话题。为了解决这些问题,本文提出了一种新颖的半监督增量特征提取算法。首先,我们以无监督的方式增量提取特征。然后,我们利用优势类信息调整k最近邻权重,提出了一种半监督子空间学习算法。第三,结合无监督和半监督特征提取方法获得目标函数,以解决样本外学习问题。已经在Universityrnof California Irvine(UCI)数据集的机器学习数据集和真实世界的人脸图像数据集(Olivetti人脸(ORL),Yale,YaleB,rnand Rendered face)上进行了实验。为了证明该算法在处理大规模数据流方面的可扩展性,在并行计算环境中使用Spark技术进行了分类实验,并与一些相关的半监督特征提取方法进行了比较。实验结果和计算复杂度的比较表明,该算法具有良好的性能。版权©2016 John Wiley&Sons,Ltd.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号