A non-parametric maximum for number of selected features: objective optima for FDR and significance threshold with application to ordinal survey analysis

Amir Hassan Ghaseminejad Tafreshi

首页> 外文期刊>Journal of Big Data >A non-parametric maximum for number of selected features: objective optima for FDR and significance threshold with application to ordinal survey analysis

【24h】

A non-parametric maximum for number of selected features: objective optima for FDR and significance threshold with application to ordinal survey analysis

机译：所选要素数量的非参数最大值：FDR的客观最优值和显着性阈值，可应用于有序调查分析

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Abstract This paper identifies a criterion for choosing an optimum set of selected features, or rejected null hypotheses, in high-dimensional data analysis. The method is designed for dimension reduction with multiple hypothesis testing used in filtering process of big data, and in exploratory research, to identify significant associations among many predictor variables and few outcomes. The novelty of the proposed method is that the selected p-value threshold will be insensitive to dependency within features, and between features and outcome. The method neither requires predetermined thresholds for level of significance, nor uses presumed thresholds for false discovery rate. Using the presented method, the optimum p-value for powerful yet parsimonious model is chosen, then for every set of rejected hypotheses, the researcher can also report traditional measures of statistical accuracy such as the expected number of false positives, and false discovery rate. The upper limit for number of rejected hypotheses (or selected features) is determined by finding the maximum difference between expected true hypotheses and expected false hypotheses among all possible sets of rejected hypotheses. Then, many methods of choosing an optimum number of selected features such as piecewise regression are used to form a parsimonious model. The paper reports the results of implementation of proposed methods in a novel example of non-parametric analysis of high-dimensional ordinal survey data.

机译：摘要本文确定了在高维数据分析中选择最优选择特征集或拒绝零假设的准则。该方法旨在通过在大数据过滤过程中和探索性研究中使用的多个假设检验来减少维度，以识别许多预测变量和很少结果之间的显着关联。所提出的方法的新颖性在于所选的p值阈值将对特征内以及特征与结果之间的依赖性不敏感。该方法既不需要用于重要性水平的预定阈值，也不需要用于错误发现率的假定阈值。使用提出的方法，选择功能强大但简约的模型的最佳p值，然后针对每组被拒绝的假设，研究人员还可以报告传统的统计准确性度量，例如预期的假阳性数和假发现率。拒绝假设（或选定特征）数量的上限是通过在所有可能的拒绝假设集合中找到期望的真实假设和期望的错误假设之间的最大差异来确定的。然后，许多选择最佳数量的选定特征的方法（例如分段回归）被用于形成简约模型。本文在高维序数调查数据的非参数分析的一个新示例中报告了所提出方法的实施结果。

著录项

来源
《Journal of Big Data》 |2018年第1期|共19页
作者
Amir Hassan Ghaseminejad Tafreshi;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
High-dimensional data analysisDimension reductionFeature selectionMultiple hypothesis testingFalse discovery rateOptimum significance thresholdMaximum for reasonable number of rejected hypothesesBig data analysis;

机译：高维数据分析降维特征选择多个假设测试错误发现率最佳显着性阈值合理的拒绝假设数量的最大值大数据分析;

相似文献

外文文献
专利

1. Bayesian non-parametric analysis of multirater ordinal data, with application to prioritizing research goals for prevention of suicide [J] . Terrance D. Savitsky, Siddhartha R. Dalal Journal of the royal statistical society . 2014,第pta4期

机译：多评分者序数数据的贝叶斯非参数分析，可用于优先研究预防自杀的研究目标
2. Non-parametric Bayesian mixture of sparse regressions with application towards feature selection for statistical downscaling [J] . Das D., Dy J., Ross J., Nonlinear processes in geophysics . 2014,第6期

机译：稀疏回归的非参数贝叶斯混合及其在特征选择中的应用
3. Non-parametric Bayesian mixture of sparse regressions with application towards feature selection for statistical downscaling [J] . Das D., Dy J., Ross J., Nonlinear Processes in Geophysics Discussions . 2014,第6期

机译：稀疏回归的非参数贝叶斯混合及其在特征选择中的应用
4. Fisher Score-Based Feature Selection for Ordinal Classification: A Social Survey on Subjective Weil-Being [C] . Maria Perez-Ortiz, Mercedes Torres-Jimenez, Pedro Antonio Gutierrez, International conference on hybrid artificial intelligent systems . 2016

机译：基于Fisher分数的序数分类特征选择：主观幸福感的社会调查
5. The effects of relative asymmetries and kurtosis on maximum likelihood factor analysis of ordinal variables [D] . Peacher-Ryan, John Holmes. 1995

机译：相对不对称和峰度对有序变量的最大似然因子分析的影响
6. Computing Molecular Signatures as Optima of a Bi-Objective Function: Method and Application to Prediction in Oncogenomics [O] . Vincent Gardeux, Rachid Chelouah, Maria F Barbosa Wanderley, 2015

机译：计算分子签名作为双目标函数的最优：方法和在肿瘤基因组学预测中的应用
7. A non-parametric maximum for number of selected features: objective optima for FDR and significance threshold with application to ordinal survey analysis [O] . Amir Hassan Ghaseminejad Tafreshi 2018

机译：选定特征数量的非参数最大值：用于序列调查分析的FDR和意义阈值的目标Optima

A non-parametric maximum for number of selected features: objective optima for FDR and significance threshold with application to ordinal survey analysis

摘要

著录项

相似文献

相关主题

期刊订阅