首页> 外文期刊>The Annals of applied statistics >SPARSE LEAST TRIMMED SQUARES REGRESSION FOR ANALYZING HIGH-DIMENSIONAL LARGE DATA SETS
【24h】

SPARSE LEAST TRIMMED SQUARES REGRESSION FOR ANALYZING HIGH-DIMENSIONAL LARGE DATA SETS

机译:稀疏最小二乘平方回归分析高维大数据集

获取原文
获取原文并翻译 | 示例
           

摘要

Sparse model estimation is a topic of high importance in modern data analysis due to the increasing availability of data sets with a large number of variables. Another common problem in applied statistics is the presence of outliers in the data. This paper combines robust regression and sparse model estimation. A robust and sparse estimator is introduced by adding an L_1 penalty on the coefficient estimates to the well-known least trimmed squares (LTS) estimator. The breakdown point of this sparse LTS estimator is derived, and a fast algorithm for its computation is proposed. In addition, the sparse LTS is applied to protein and gene expression data of the NCI-60 cancer cell panel. Both a simulation study and the real data application show that the sparse LTS has better prediction performance than its competitors in the presence of leverage points.
机译:由于具有大量变量的数据集的可用性越来越高,因此稀疏模型估计是现代数据分析中非常重要的主题。应用统计数据中的另一个常见问题是数据中存在异常值。本文结合了鲁棒回归和稀疏模型估计。通过将系数估计值的L_1罚分添加到众所周知的最小修剪平方(LTS)估计器,可以引入鲁棒且稀疏的估计器。推导了该稀疏LTS估计器的故障点,并提出了一种快速的算法。此外,稀疏LTS应用于NCI-60癌细胞组的蛋白质和基因表达数据。仿真研究和实际数据应用均表明,在存在杠杆点的情况下,稀疏LTS的预测性能优于竞争对手。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号