首页> 外文OA文献 >Dimensionality Reduction via Matrix Factorization for Predictive Modeling from Large, Sparse Behavioral Data
【2h】

Dimensionality Reduction via Matrix Factorization for Predictive Modeling from Large, Sparse Behavioral Data

机译:基于矩阵分解的维数降维用于大型稀疏行为数据的预测建模

摘要

Matrix factorization is a popular technique for engineering features for use in predictive models; it is viewed as a key part of the predictive analytics process and is used in many different domain areas. The purpose of this paper is to investigate matrix-factorization-based dimensionality reduction as a design artifact in predictive analytics. With the rise in availability of large amounts of sparse behavioral data, this investigation comes at a time when traditional techniques must be reevaluated. Our contribution is based on two lines of inquiry: we survey the literature on dimensionality reduction in predictive analytics, and we undertake an experimental evaluation comparing using dimensionality reduction versus not using dimensionality reduction for predictive modeling from large, sparse behavioral data.Our survey of the dimensionality reduction literature reveals that, despite mixed empirical evidence as to the benefit of computing dimensionality reduction, it is frequently applied in predictive modeling research and application without either comparing to a model built using the full feature set or utilizing state-of-the-art predictive modeling techniques for complexity control. This presents a concern, as the survey reveals complexity control as one of the main reasons for employing dimensionality reduction. This lack of comparison is troubling in light of our empirical results. We experimentally evaluate the e cacy of dimensionality reduction in the context of a collection of predictive modeling problems from a large-scale published study.We find that utilizing dimensionality reduction improves predictive performance only under certain, rather narrow, conditions. Specifically, under default regularization (complexity control)settings dimensionality reduction helps for the more di cult predictive problems (where the predictive performance of a model built using the original feature set is relatively lower), but it actually decreases the performance on the easier problems. More surprisingly, employing state-of-the-art methods for selecting regularization parameters actually eliminates any advantage that dimensionality reduction has! Since the value of building accurate predictive models for business analytics applications has been well-established, the resulting guidelines for the application of dimensionality reduction should lead to better research and managerial decisions.
机译:矩阵分解是一种流行的技术,可用于预测模型中的工程特征。它被视为预测分析过程的关键部分,并在许多不同的领域中使用。本文的目的是研究基于矩阵分解的降维作为预测分析中的设计工件。随着大量稀疏行为数据的可用性增加,这项研究是在必须重新评估传统技术的时候进行的。我们的贡献基于以下两个方面:我们调查了预测分析中降维的文献,并且我们进行了实验评估,比较了使用降维与不使用降维进行大型,稀疏行为数据的预测建模。降维文献显示,尽管在计算降维的好处方面有混合的经验证据,但它经常用于预测建模研究和应用中,而无需与使用完整功能集或使用最新技术构建的模型进行比较用于复杂性控制的预测建模技术。由于调查显示复杂性控制是采用降维的主要原因之一,因此这引起了关注。根据我们的经验结果,这种缺乏比较的问题令人不安。我们在大规模已发表研究的一系列预测建模问题的背景下,通过实验评估了降维效果的有效性,我们发现利用降维效果只能在某些相当狭窄的条件下提高预测性能。具体来说,在默认正则化(复杂度控制)设置下,降维有助于解决更困难的预测问题(使用原始特征集构建的模型的预测性能相对较低),但实际上会降低较容易解决的问题的性能。更令人惊讶的是,采用最新方法选择正则化参数实际上消除了降维具有的任何优势!由于为业务分析应用程序建立准确的预测模型的价值已得到公认,因此所产生的降维应用指南将导致更好的研究和管理决策。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号