首页> 外文会议>European Conference on Machine Learning(ECML 2007); 20070917-21; Warsaw(PL) >Stability Based Sparse LSI/PCA: Incorporating Feature Selection in LSI and PCA
【24h】

Stability Based Sparse LSI/PCA: Incorporating Feature Selection in LSI and PCA

机译:基于稳定性的稀疏LSI / PCA:在LSI和PCA中纳入功能选择

获取原文
获取原文并翻译 | 示例

摘要

The stability of sample based algorithms is a concept commonly used for parameter tuning and validity assessment. In this paper we focus on two well studied algorithms, LSI and PCA, and propose a feature selection process that provably guarantees the stability of their outputs. The feature selection process is performed such that the level of (statistical) accuracy of the LSI/PCA input matrices is adequate for computing meaningful (stable) eigenvectors. The feature selection process "sparsifies" LSI/PCA, resulting in the projection of the instances on the eigenvectors of a principal submatrix of the original input matrix, thus producing sparse factor loadings that are linear combinations solely of the selected features. We utilize bootstrapping confidence intervals for assessing the statistical accuracy of the input sample matrices, and matrix perturbation theory in order to relate the statistical accuracy to the stability of eigenvectors. Experiments on several UCI-datasets verify empirically our approach.
机译:基于样本的算法的稳定性是通常用于参数调整和有效性评估的概念。在本文中,我们着重研究两种经过深入研究的算法LSI和PCA,并提出一种特征选择过程,该过程可证明保证其输出的稳定性。执行特征选择过程,以使LSI / PCA输入矩阵的(统计)准确度足以计算有意义的(稳定)特征向量。特征选择过程“稀疏”了LSI / PCA,从而将实例投影到原始输入矩阵的主子矩阵的特征向量上,从而产生稀疏因子负载,这些稀疏因子负载是所选特征的线性组合。我们利用自举置信区间来评估输入样本矩阵的统计准确性,并使用矩阵扰动理论来将统计准确性与特征向量的稳定性相关联。在几个UCI数据集上进行的实验从经验上证明了我们的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号