首页> 外文期刊>Journal of Parallel and Distributed Computing >A parallel approximate SS-ELM algorithm based on MapReduce for large-scale datasets
【24h】

A parallel approximate SS-ELM algorithm based on MapReduce for large-scale datasets

机译:基于MapReduce的大规模数据集并行近似SS-ELM算法

获取原文
获取原文并翻译 | 示例

摘要

Extreme Learning Machine (ELM) algorithm not only has gained much attention of many scholars and researchers, but also has been widely applied in recent years especially when dealing with big data because of its better generalization performance and learning speed. The proposal of SS-ELM (semi-supervised Extreme Learning Machine) extends ELM algorithm to the area of semi-supervised learning which is an important issue of machine learning on big data. However, the original SS-ELM algorithm needs to store the data in the memory before processing it, so that it could not handle large and web-scale data sets which are of frequent appearance in the era of big data. To solve this problem, this paper firstly proposes an efficient parallel SS-ELM (PSS-ELM) algorithm on MapReduce model, adopting a series of optimizations to improve its performance. Then, a parallel approximate SS-ELM Algorithm based on MapReduce (PASS-ELM) is proposed. PASS-ELM is based on the approximate adjacent similarity matrix (AASM) algorithm, which leverages the Locality-Sensitive Hashing (LSH) scheme to calculate the approximate adjacent similarity matrix, thus greatly reducing the complexity and occupied memory. The proposed AASM algorithm is general, because the calculation of the adjacent similarity matrix is the key operation in many other machine learning algorithms. The experimental results have demonstrated that the proposed PASS-ELM algorithm can efficiently process very large-scale data sets with a good performance, without significantly impacting the accuracy of the results.
机译:极限学习机(Extreme Learning Machine,ELM)算法不仅得到了许多学者和研究者的关注,而且由于其更好的泛化性能和学习速度,近年来尤其在处理大数据时得到了广泛的应用。 SS-ELM(半监督的极限学习机)的提议将ELM算法扩展到半监督的学习领域,这是机器学习大数据的重要问题。但是,原始的SS-ELM算法需要在处理数据之前将其存储在内存中,以使其无法处理在大数据时代经常出现的大型Web数据集。为了解决这个问题,本文首先提出了一种基于MapReduce模型的高效并行SS-ELM(PSS-ELM)算法,并通过一系列优化来提高其性能。然后,提出了一种基于MapReduce的并行近似SS-ELM算法(PASS-ELM)。 PASS-ELM基于近似相邻相似度矩阵(AASM)算法,该算法利用局部敏感哈希(LSH)方案来计算近似相邻相似度矩阵,从而大大降低了复杂度和占用的内存。提出的AASM算法是通用的,因为相邻相似矩阵的计算是许多其他机器学习算法中的关键操作。实验结果表明,所提出的PASS-ELM算法可以有效地处理非常大规模的数据集,并且具有良好的性能,而不会显着影响结果的准确性。

著录项

  • 来源
    《Journal of Parallel and Distributed Computing》 |2017年第10期|85-94|共10页
  • 作者单位

    College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China,National Supercomputing Center in Changsha, Changsha, Hunan 410082, China;

    College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China,National Supercomputing Center in Changsha, Changsha, Hunan 410082, China;

    National Supercomputing Center in Changsha, Changsha, Hunan 410082, China;

    College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China,National Supercomputing Center in Changsha, Changsha, Hunan 410082, China,Department of Computer Science, State University of New York, New Paltz, NY 12561, USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    PASS-ELM; MapReduce; LSH; Parallel; Approximate algorithm; Big data;

    机译:通行证MapReduce;LSH;平行;近似算法;大数据;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号