首页> 外文期刊>Emerging Topics in Computing, IEEE Transactions on >A Tensor-Based Approach for Big Data Representation and Dimensionality Reduction
【24h】

A Tensor-Based Approach for Big Data Representation and Dimensionality Reduction

机译:基于张量的大数据表示和降维方法

获取原文
获取原文并翻译 | 示例

摘要

Variety and veracity are two distinct characteristics of large-scale and heterogeneous data. It has been a great challenge to efficiently represent and process big data with a unified scheme. In this paper, a unified tensor model is proposed to represent the unstructured, semistructured, and structured data. With tensor extension operator, various types of data are represented as subtensors and then are merged to a unified tensor. In order to extract the core tensor which is small but contains valuable information, an incremental high order singular value decomposition (IHOSVD) method is presented. By recursively applying the incremental matrix decomposition algorithm, IHOSVD is able to update the orthogonal bases and compute the new core tensor. Analyzes in terms of time complexity, memory usage, and approximation accuracy of the proposed method are provided in this paper. A case study illustrates that approximate data reconstructed from the core set containing 18% elements can guarantee 93% accuracy in general. Theoretical analyzes and experimental results demonstrate that the proposed unified tensor model and IHOSVD method are efficient for big data representation and dimensionality reduction.
机译:多样性和准确性是大规模和异构数据的两个明显特征。用统一的方案有效地表示和处理大数据一直是一个巨大的挑战。本文提出了一个统一的张量模型来表示非结构化,半结构化和结构化数据。使用张量扩展运算符,将各种类型的数据表示为子张量,然后将其合并为统一的张量。为了提取较小的但包含有价值信息的核心张量,提出了一种增量式高阶奇异值分解(IHOSVD)方法。通过递归应用增量矩阵分解算法,IHOSVD能够更新正交基并计算新的核心张量。本文从时间复杂度,内存使用率和近似精度方面进行了分析。案例研究表明,从包含18%元素的核心集中重建的近似数据通常可以保证93%的准确性。理论分析和实验结果表明,所提出的统一张量模型和IHOSVD方法对于大数据表示和降维有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号