首页> 外文期刊>Scientific programming >Parallel Framework for Dimensionality Reduction of Large-Scale Datasets
【24h】

Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

机译:大规模数据集降维的并行框架

获取原文
获取原文并翻译 | 示例

摘要

Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.
机译:降维是指一组数学技术,用于降低原始高维数据的复杂度,同时保留其选定的属性。模拟策略和实验数据收集方法的改进导致大量的异构数据和高维数据,这通常使降维成为获取数据定性和定量理解的唯一可行方法。但是,现有的降维软件通常无法缩放到现实应用中出现的数据集,该数据集可能包含数千个具有数百万个维度的点。在本文中,我们提出了一种用于大规模数据降维的并行框架。我们确定了频谱降维技术的关键组成部分,并提出了其有效的并行实现方案。我们证明,在16,000核群集上执行时,所得框架可用于处理由数百万个点组成的数据集,这是当前可用方法无法实现的。为了进一步证明我们框架的适用性,我们执行了75,000张图像的降维处理,以表示有机太阳能电池制造过程中的形态演变,以便确定加工参数如何影响形态演变。

著录项

  • 来源
    《Scientific programming》 |2015年第2015期|180214.1-180214.12|共12页
  • 作者单位

    Georgia Inst Technol, Dept Mech Engn, Atlanta, GA 30080 USA;

    SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14620 USA|SUNY Buffalo, Dept Biomed Informat, Buffalo, NY 14620 USA;

    Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA;

    Iowa State Univ, Dept Mech Engn, Ames, IA 50011 USA;

  • 收录信息 美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号