Feature selection in finite mixture of sparse normal linear models in high-dimensional feature space

Abbas Khalili

首页> 外文期刊>Biostatistics >Feature selection in finite mixture of sparse normal linear models in high-dimensional feature space

【24h】

Feature selection in finite mixture of sparse normal linear models in high-dimensional feature space

机译：高维特征空间中稀疏法线模型有限混合中的特征选择

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Rapid advancement in modern technology has allowed scientists to collect data of unprecedented size and complexity. This is particularly the case in genomics applications. One type of statistical problem in such applications is concerned with modeling an output variable as a function of a small subset of a large number of features based on relatively small sample sizes, which may even be coming from multiple subpopulations. As such, selecting the correct predictive features (variables) for each subpopulation is the key. To address this issue, we consider the problem of feature selection in finite mixture of sparse normal linear (FMSL) models in large feature spaces. We propose a 2-stage procedure to overcome computational difficulties and large false discovery rates caused by the large model space. First, to deal with the curse of dimensionality, a likelihood-based boosting is designed to effectively reduce the number of candidate features. This is the key thrust of our new method. The greatly reduced set of features is then subjected to a sparsity inducing procedure via a penalized likelihood method. A novel scheme is also proposed for the difficult problem of finding good starting points for the expectation–maximization estimation of mixture parameters. We use an extended Bayesian information criterion to determine the final FMSL model. Simulation results indicate that the procedure is successful in selecting the significant features without including a large number of insignificant ones. A real data example on gene transcription regulation is also presented.

机译：现代技术的飞速发展使科学家能够收集前所未有的规模和复杂性的数据。在基因组学应用中尤其如此。这种应用中的一种统计问题涉及根据相对较小的样本量（甚至可能来自多个子种群），根据大量特征的一小子集对输出变量进行建模。因此，为每个亚人群选择正确的预测特征（变量）是关键。为了解决这个问题，我们考虑在大特征空间中稀疏法线（FMSL）模型的有限混合中的特征选择问题。我们提出了一个两阶段的程序，以克服由于较大的模型空间而导致的计算困难和较大的错误发现率。首先，为了应对维数的诅咒，设计了一种基于似然的增强算法，以有效减少候选特征的数量。这是我们新方法的重点。然后，通过惩罚似然方法对大大减少的特征集进行稀疏性诱导过程。对于为混合参数的期望最大化估计找到良好起点的难题，也提出了一种新方案。我们使用扩展的贝叶斯信息准则来确定最终的FMSL模型。仿真结果表明，该程序成功选择了重要特征，而没有包含大量无关紧要的特征。还提供了有关基因转录调控的真实数据示例。

著录项

来源
《Biostatistics》 |2011年第1期|p.156-172|共17页
作者
Abbas Khalili;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Feature selection in finite mixture of sparse normal linear models in high-dimensional feature space [J] . Khalili A., Chen J., Lin S. Biostatistics . 2011,第1期

机译：高维特征空间中稀疏法线模型有限混合中的特征选择
2. A finite mixture model for simultaneous high-dimensional clustering, localized feature selection and outlier rejection [J] . Nizar Bouguila, Khaled Almakadmeh, Sabri Boutemedjet Expert Systems with Application . 2012,第7期

机译：同时进行高维聚类，局部特征选择和离群值剔除的有限混合模型
3. Simultaneous high-dimensional clustering and feature selection using asymmetric Gaussian mixture models [J] . Tarek Elguebaly, Nizar Bouguila Image and Vision Computing . 2015,第feba期

机译：使用非对称高斯混合模型同时进行高维聚类和特征选择
4. A Hybrid Ensemble Feature Selection-Based Learning Model for COPD Prediction on High-Dimensional Feature Space [C] . Srinivas Raja Banda Banda, Tummala Ranga Babu International Conference on Data Engineering and Communication Technology . 2020

机译：基于混合集合特征选择的高维特征空间对COPD预测的学习模型
5. Feature extraction techniques in high-dimensional spaces: Linear and nonlinear approaches. [D] . Cevikalp, Hakan. 2005

机译：高维空间中的特征提取技术：线性和非线性方法。
6. A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data [O] . Andrea Bommert, Jörg Rahnenführer, Michel Lang 2017

机译：查找高维数据具有稳定特征选择的预测和稀疏模型的多准则方法
7. A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data [O] . Andrea Bommert, Jörg Rahnenführer, Michel Lang 2017

机译：用于查找预测和稀疏模型的多轨道方法，具有稳定的高维数据特征选择

Feature selection in finite mixture of sparse normal linear models in high-dimensional feature space

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅