...
首页> 外文期刊>Quality Control and Applied Statistics >Significance analysis of high-dimensional, low-sample size partially labeled data
【24h】

Significance analysis of high-dimensional, low-sample size partially labeled data

机译:高维,低样本尺寸部分标记数据的意义分析

获取原文
获取原文并翻译 | 示例
           

摘要

Purpose:To propose a method for classification/clustering based on significance analysis of high dimensional, low-sample size data where only a small portion of the class labels (partially labeled) are available.Summary:It is highlighted about the role played by classification and clustering activities in statistical learning. Testing the difference between two classes is quite challenging when there is high-dimensional, low-sample size (HDLSS) data. While there are approaches to deal with such data, the problem becomes difficult when there are observations with many not having class labels (partially labeled data). The article develops a significance testing method for the HDLSS partially labeled data. Two significance analysis methods are considered:> DiProPerm test that is applicable when all class labels are known, and> Statistical significance clustering test (SigClust) that does not require a labelA detailed review of these test methods is presented from the perspective of their application to HDLSS data and the proposed test method for the significance analysis of HDLSS partially labeled (SigPal) data is presented. Some theoretical results are studied with an emphasize on an HDLSS data setting. In order to illustrate the proposed test method, a comprehensive simulation study is considered. A real time application to breast cancer data is also studied do demonstrate the usefulness of the proposed method and the results are discussed. (41 refs.) Results:While classification and clustering activities are important tools in statistical learning, their successful application depends on the nature of data on hand. Generally, in the case of classification, class labels are provided prior to the analysis, while such labels are unavailable in the clustering analysis. Also there are situations where the high-dimensional, low-sample size (HDLSS) data need to be dealt with, and this becomes more challenging.
机译:目的:提出一种基于高维,低样本大小数据的显着性分析的分类/聚类方法,其中仅可用的一小部分标签(部分标记)。突出显示:突出显示分类的作用统计学习中的聚类活动。测试两种类之间的差异是非常具有挑战性的,当存在高维的低样本大小(HDLS)数据时。虽然存在处理此类数据的方法,但是当有许多没有具有类标签(部分标记的数据)时,问题变得困难。本文开发了HDLSS部分标记数据的重要性测试方法。考虑了两种意义分析方法:> DiproPerm测试适用,当所有类标签都是已知的,并且>不需要Labela的统计显着性聚类测试(Sigclust),从应用程序的角度来看,提出了对这些测试方法的详细审查提出了HDLSS数据和呈现了部分标记的HDLSS的重要性分析(SIGPAL)数据的显着分析的测试方法。研究了一些理论结果,并强调了HDLSS数据设置。为了说明所提出的测试方法,考虑了全面的仿真研究。还研究了乳腺癌数据的实时应用,表明了所提出的方法的有用性,并讨论了结果。 (41 refs。)结果:虽然分类和聚类活动是统计学习中的重要工具,但其成功的应用程序取决于手头数据的性质。通常,在分类的情况下,在分析之前提供类标签,而这种标签在聚类分析中不可用。此外,存在需要处理高维,低样本大小(HDLS)数据的情况,并且这变得更具挑战性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号