首页> 外文期刊>International Journal of Data Science and Analysis >A Topological Approach of Principal Component Analysis
【24h】

A Topological Approach of Principal Component Analysis

机译:主要成分分析的拓扑方法

获取原文
           

摘要

Large datasets are increasingly widespread in many disciplines. The exponential growth of data requires the development of more data analysis methods in order to process information more efficiently. In order to better visualize the data, many methods such as Principal Component Analysis (PCA) and MultiDimensional Scaling (MDS) allow to extract a low-dimensional structure from high-dimensional data set. The proposed approach, called Topological Principal Component Analysis (TPCA), is a multidimensional descriptive method witch studies a homogeneous set of continuous variables defined on the same set of individuals. It is a topological method of data analysis that consists of comparing and classifying proximity measures from among some of the most widely used proximity measures for continuous data. Proximity measures play an important role in many areas of data analysis, the results strongly depend on the proximity measure chosen. So, among the many existing measures, which one is most useful? Are they all equivalent? How to identify the one that is most appropriate to analyze the correlation structure of a set of quantitative variables. TPCA proposes an appropriate adjacency matrix associated to an unknown proximity measure according to the data under consideration, then analyzes and visualizes, with graphic representations, the relationship structure of the variables relating to, the well known PCA problem. Its uses the concept of neighborhood graphs and compares a set of proximity measures for continuous data which can be more-or-less equivalent a topological equivalence criterion between two proximity measures is defined and statistically tested according to the topological correlation between the variables considered. An example on real data illustrates the proposed approach.
机译:许多学科中,大型数据集越来越普遍。数据的指数增长需要开发更多数据分析方法,以便更有效地处理信息。为了更好地可视化数据,许多方法如主成分分析(PCA)和多维缩放(MDS)允许从高维数据集中提取低维结构。所提出的方法称为拓扑主成分分析(TPCA),是一种多维描述方法巫术研究在同一组中定义的同类连续变量集。它是一种数据分析的拓扑方法,包括比较和分类来自连续数据的一些最广泛使用的接近度量中的接近度量。接近度措施在许多数据分析领域发挥着重要作用,结果强烈取决于所选择的邻近度量。那么,在许多现有措施中,哪一个最有用?它们都是等同的吗?如何识别最适合分析一组定量变量的相关结构的那个。 TPCA提出了与根据所考虑的数据的未知接近度量相关的适当邻接矩阵,然后分析和可视化,以图形表示,与众所周知的PCA问题有关的变量的关系结构。它使用邻域图的概念,并比较了一组接近度测量,用于连续数据可以更加或更低等效地定义和统计地测试了两个接近度测量之间的拓扑等效标准,根据所考虑的变量之间的拓扑相关性地定义和统计测试。实际数据的一个例子说明了所提出的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号