...
首页> 外文期刊>Machine Learning >Random projections as regularizers: learning a linear discriminant from fewer observations than dimensions
【24h】

Random projections as regularizers: learning a linear discriminant from fewer observations than dimensions

机译:随机投影作为正则化器:从少于维度的观察值中学习线性判别

获取原文
获取原文并翻译 | 示例

摘要

We prove theoretical guarantees for an averaging-ensemble of randomly projected Fisher linear discriminant classifiers, focusing on the case when there are fewer training observations than data dimensions. The specific form and simplicity of this ensemble permits a direct and much more detailed analysis than existing generic tools in previous works. In particular, we are able to derive the exact form of the generalization error of our ensemble, conditional on the training set, and based on this we give theoretical guarantees which directly link the performance of the ensemble to that of the corresponding linear discriminant learned in the full data space. To the best of our knowledge these are the first theoretical results to prove such an explicit link for any classifier and classifier ensemble pair. Furthermore we show that the randomly projected ensemble is equivalent to implementing a sophisticated regularization scheme to the linear discriminant learned in the original data space and this prevents overfitting in conditions of small sample size where pseudo-inverse FLD learned in the data space is provably poor. Our ensemble is learned from a set of randomly projected representations of the original high dimensional data and therefore for this approach data can be collected, stored and processed in such a compressed form. We confirm our theoretical findings with experiments, and demonstrate the utility of our approach on several datasets from the bioinformatics domain and one very high dimensional dataset from the drug discovery domain, both settings in which fewer observations than dimensions are the norm.
机译:我们证明了随机投影的Fisher线性判别分类器的平均合集的理论保证,重点是训练观测少于数据维度的情况。该集合的特定形式和简单性使得它可以比以前的工作中的现有通用工具进行直接而详尽的分析。特别是,我们能够根据训练集得出集合整体化误差的精确形式,并在此基础上给出理论上的保证,这些保证将集合体的性能直接与学习的相应线性判别式联系起来。完整的数据空间。据我们所知,这是证明任何分类器和分类器集合对具有如此明确联系的第一个理论结果。此外,我们表明,随机投影的集合等效于对原始数据空间中学习的线性判别方法实施复杂的正则化方案,这可以防止在样本量较小的情况下过拟合,而在这种情况下,在数据空间中学习的伪逆FLD证明很差。我们的集成是从一组原始高维数据的随机投影表示中学习的,因此,对于这种方法,可以以这种压缩形式收集,存储和处理数据。我们通过实验证实了我们的理论发现,并证明了我们的方法在来自生物信息学领域的几个数据集和来自药物发现领域的一个非常高维度的数据集上的实用性,在这两种设置中,少于维度的观察值是常态。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号