首页> 外文会议>European conference on machine learning and knowledge discovery in databases >Massively Parallel Feature Selection: An Approach Based on Variance Preservation
【24h】

Massively Parallel Feature Selection: An Approach Based on Variance Preservation

机译:大规模并行特征选择:基于方差保留的方法

获取原文

摘要

Advances in computer technologies have enabled corporations to accumulate data at an unprecedented speed. Large-scale business data might contain billions of observations and thousands of features, which easily brings their scale to the level of terabytes. Most traditional feature selection algorithms are designed for a centralized computing architecture. Their usability significantly deteriorates when data size exceeds hundreds of gigabytes. High-performance distributed computing frameworks and protocols, such as the Message Passing Interface (MPI) and MapReduce, have been proposed to facilitate software development on grid infrastructures, enabling analysts to process large-scale problems efficiently. This paper presents a novel large-scale feature selection algorithm that is based on variance analysis. The algorithm selects features by evaluating their abilities to explain data variance. It supports both supervised and unsupervised feature selection and can be readily implemented in most distributed computing environments. The algorithm was developed as a SAS High-Performance Analytics procedure, which can read data in distributed form and perform parallel feature selection in both symmetric multiprocessing mode and massively parallel processing mode. Experimental results demonstrated the superior performance of the proposed method for large scale feature selection.
机译:计算机技术的进步使公司能够以前所未有的速度积累数据。大型业务数据可能包含数十亿个观测值和数千个功能,这很容易将其规模提高到TB级。大多数传统的特征选择算法都是为集中式计算体系结构设计的。当数据大小超过数百GB时,它们的可用性会大大降低。已经提出了诸如消息传递接口(MPI)和MapReduce之类的高性能分布式计算框架和协议,以促进网格基础结构上的软件开发,从而使分析人员能够有效地处理大规模问题。本文提出了一种基于方差分析的新型大规模特征选择算法。该算法通过评估要素解释数据差异的能力来选择要素。它支持受监督和不受监督的功能选择,并且可以在大多数分布式计算环境中轻松实现。该算法是作为SAS高性能分析程序开发的,可以读取分布式数据并在对称多处理模式和大规模并行处理模式下执行并行特征选择。实验结果证明了该方法在大规模特征选择中的优越性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号