首页> 外文期刊>Expert systems with applications >Polygonal Coordinate System: Visualizing high-dimensional data using geometric DR, and a deterministic version of t-SNE
【24h】

Polygonal Coordinate System: Visualizing high-dimensional data using geometric DR, and a deterministic version of t-SNE

机译:多边形坐标系:使用几何DR可视化高维数据,以及T-SNE的确定性版本

获取原文
获取原文并翻译 | 示例

摘要

Dimensionality Reduction (DR) is useful to understand high-dimensional data. It attracts wide attention from industry and academia and is employed in areas such as machine learning, data mining, and pattern recognition. This work presents a geometric approach to DR termed Polygonal Coordinate System (PCS), capable of representing multidimensional data in two or three dimensions while preserving their inherent overall structure by taking advantage of a polygonal interface bridging high-and low-dimensional spaces. PCS can handle Big Data by adopting an incremental, geometric DR with linear-time complexity. A new version of t-Distributed Stochastic Neighbor Embedding (t-SNE), a state-of-the-art algorithm for DR, is also provided. It employs a PCS-based deterministic strategy and is named t-Distributed Deterministic Neighbor Embedding (t-DNE). Several synthetic and real data sets were used as well-known real-world problem archetypes in our benchmark, providing a means to evaluate PCS and t-DNE against four embedding-based DR algorithms: two linear-transformation ones (Principal Component Analysis and Non-negative Matrix Factorization) and two nonlinear ones (t-SNE and Sammon & rsquo;s Mapping). Statistical comparisons of the execution times of these algorithms, by the Friedman & rsquo;s significance test, highlight the efficiency of PCS in data embedding. PCS tends to surpass its counterparts in several aspects explored in this work, including asymptotic time and space complexity, preservation of global data-inherent structures, number of hyperparameters, and applicability to unobserved data.
机译:减少维度(DR)可用于理解高维数据。它吸引了行业和学术界的广泛关注,并在机器学习,数据挖掘和模式识别等领域使用。该工作呈现了DR称为多边形坐标系(PC)的几何方法,该方法能够通过利用高和低维空间的多边形界面来保持其固有的整体结构,以两三维表示多维数据。 PC通过采用具有线性时间复杂度的增量,几何DR来处理大数据。还提供了一种新版本的T分布式随机邻居嵌入(T-SNE),用于DR的最先进的DR算法。它采用基于PC的确定性策略,并命名为T分布式确定性邻居嵌入(T-DNE)。在我们的基准测试中使用了几种合成和真实数据集作为众所周知的真实世界问题原型,提供了一种评估PC和T-DNE对四个基于嵌入的DR算法的方法:两个线性变换(主成分分析和非 - 负矩阵分解)和两个非线性α(T-SNE和SAMMON&RSQU; S映射)。弗里德曼&rsquo的这些算法的执行时间的统计比较突出了数据嵌入的数据嵌入式的效率。 PC在这项工作中探讨的若干方面倾向于超过其对应物,包括渐近时间和空间复杂性,保存全球数据固有结构,超公数数量,以及对未观察的数据的适用性。

著录项

  • 来源
    《Expert systems with applications》 |2021年第8期|114741.1-114741.34|共34页
  • 作者单位

    Fed Univ Para Appl Electromagnetism Lab 01 Augusto Correa BR-66075110 Belem Para Brazil;

    Fed Univ Para Appl Electromagnetism Lab 01 Augusto Correa BR-66075110 Belem Para Brazil|Inst Tecnol Vale 955 Boaventura Silva BR-66055090 Belem Para Brazil;

    Fed Univ Para Appl Electromagnetism Lab 01 Augusto Correa BR-66075110 Belem Para Brazil;

    Inst Tecnol Vale 955 Boaventura Silva BR-66055090 Belem Para Brazil;

    Fed Univ Para Appl Electromagnetism Lab 01 Augusto Correa BR-66075110 Belem Para Brazil;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Dimensionality reduction; Embedding; Visualization; Machine learning; Big data;

    机译:减少维度;嵌入;可视化;机器学习;大数据;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号