Significance analysis of high-dimensional, low-sample size partially labeled data

Lu Qiyi; Qiao Xingye

首页> 外文期刊>Journal of Statistical Planning and Inference >Significance analysis of high-dimensional, low-sample size partially labeled data

【24h】

Significance analysis of high-dimensional, low-sample size partially labeled data

机译：高维，低样本量的部分标记数据的意义分析

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Classification and clustering are both important topics in statistical learning. A natural question herein is whether predefined classes are really different from one another, or whether clusters are really there. Specifically, we may be interested in knowing whether the two classes defined by some class labels (when they are provided), or the two clusters tagged by a clustering algorithm (where class labels are not provided), are from the same underlying distribution. Although both are challenging questions for the high-dimensional, low-sample size data, there has been some recent development for both. However, when it is costly to manually place labels on observations, it is often that only a small portion of the class labels is available. In this article, we propose a significance analysis method for such type of data, namely partially labeled data. Our method makes use of the whole data and tries to test the class difference as if all the labels were observed. Compared to a testing method that ignores the label information, our method provides a greater power, meanwhile, maintaining the size, illustrated by a comprehensive simulation study. Theoretical properties of the proposed method are studied with emphasis on the high dimensional, low-sample size setting. Our simulated examples help to understand when and how the information extracted from the labeled data can be effective. A real data example further illustrates the usefulness of the proposed method. (C) 2016 Elsevier B.V. All rights reserved.

机译：分类和聚类都是统计学习中的重要主题。这里的一个自然问题是，预定义的类是否真的彼此不同，或者群集是否真的存在。具体来说，我们可能想知道某个类标签定义的两个类（如果提供了它们），或者由聚类算法标记的两个群集（未提供类标签）是否来自相同的基础分布。尽管对于高维，低样本量的数据，这两者都是具有挑战性的问题，但两者都有一些最新进展。但是，在手动将标签放置在观测上的成本很高时，通常只有一小部分类别标签可用。在本文中，我们提出了针对此类数据（即部分标记的数据）的重要性分析方法。我们的方法利用了整个数据，并尝试测试类差异，就像观察到所有标签一样。与忽略标签信息的测试方法相比，我们的方法提供了更大的功效，同时又保持了尺寸，这由全面的仿真研究表明。研究方法的理论性质，重点是高维，低样本量的设置。我们的模拟示例有助于理解何时以及如何从标记数据中提取信息。实际数据示例进一步说明了该方法的实用性。（C）2016 Elsevier B.V.保留所有权利。

著录项

来源
《Journal of Statistical Planning and Inference》 |2016年第null期|共17页
作者
Lu Qiyi; Qiao Xingye;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类统计学;
关键词
Classification; Clustering; High-dimensional; low-sample size data; Hypothesis test; Semi-supervised learning;

机译：分类;聚类;高维;低样本量数据;假设检验;半监督学习;

相似文献

外文文献
中文文献
专利

1. Significance analysis of high-dimensional, low-sample size partially labeled data [J] . Qiyi Lu, Xingye Qiao Quality Control and Applied Statistics . 2018,第3a4期

机译：高维，低样本尺寸部分标记数据的意义分析
2. Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data [J] . Gui J, Li HZ Bioinformatics . 2005,第13期

机译：高维和低样本量设置中的惩罚性Cox回归分析，应用于微阵列基因表达数据
3. Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data [J] . Gui J, Li HZ Bioinformatics . 2005,第13期

机译：高维和低样本量设置中的惩罚性Cox回归分析，应用于微阵列基因表达数据
4. Structural Classification based Correlation and its Application to Principal Component Analysis for High-Dimension Low-Sample Size Data [C] . Mika Sato-Ilic IEEE International Conference on Fuzzy Systems . 2012

机译：基于结构分类的相关性及其在高维层低样本数据的主成分分析中的应用
5. Integrated Latent Construct Partially Linear Predictive Models with Applications to Multi-Group Study and High-Dimensional Data [D] . Yang, Lei . 2020

机译：集成潜在构造部分线性预测模型，具有多组研究和高维数据的应用
6. Nonparametric relevance-shifted multiple testing procedures for the analysis of high-dimensional multivariate data with small sample sizes [O] . Cornelia Frömke, Ludwig A Hothorn, Siegfried Kropf 2008

机译：非参数相关移位的多重测试程序用于分析小样本量的高维多元数据
7. Significance Analysis of High-Dimensional, Low-Sample Size Partially Labeled Data [O] . Lu, Qiyi, Qiao, Xingye 2015

机译：部分高维，低样本尺寸的显着性分析标记数据
8. Literature Survey on Drop Size Data, Measuring Equipment and Discussion of the Significance of Drop Size in Fire Extinguishment [R] . Hayes, W. D. 1985

机译：作者：张莹莹，王莹，王莹，王莹，中国造船sHIpBUILD

Significance analysis of high-dimensional, low-sample size partially labeled data

摘要

著录项

相似文献

相关主题

期刊订阅