首页> 外文期刊>Journal of the American statistical association >Personalized Prediction and Sparsity Pursuit in Latent Factor Models
【24h】

Personalized Prediction and Sparsity Pursuit in Latent Factor Models

机译:潜在因子模型中的个性化预测和稀疏性追求

获取原文
获取原文并翻译 | 示例
           

摘要

Personalized information filtering extracts the information specifically relevant to a user, predicting his/her preference over a large number of items, based on the opinions of users who think alike or its content. This problem is cast into the framework of regression and classification, where we integrate additional user-specific and content-specific predictors in partial latent models, for higher predictive accuracy. In particular, we factorize a user-over-item preference matrix into a product of two matrices, each representing a user's preference and an item preference by users. Then we propose a likelihood method to seek a sparsest latent factorization, from a class of overcomplete factorizations, possibly with a high percentage of missing values. This promotes additional sparsity beyond rank reduction. Computationally, we design methods based on a decomposition and combination strategy, to break large-scale optimization into many small subproblems to solve in a recursive and parallel manner. On this basis, we implement the proposed methods through multi-platform shared-memory parallel programming, and through Mahout, a library for scalable machine learning and data mining, for mapReduce computation. For example, our methods are scalable to a dataset consisting of three billions of observations on a single machine with sufficient memory, having good timings. Both theoretical and numerical investigations show that the proposed methods exhibit a significant improvement in accuracy over state-of-the-art scalable methods. Supplementary materials for this article are available online.
机译:个性化信息过滤基于与自己想法相似的用户的意见或其内容,提取与用户特别相关的信息,从而预测他/她对大量商品的偏好。这个问题被放到回归和分类的框架中,在该框架中,我们将其他特定于用户和特定于内容的预测器集成到部分潜在模型中,以实现更高的预测精度。特别是,我们将用户优先项目偏好矩阵分解为两个矩阵的乘积,每个矩阵代表用户的偏好和用户的项目偏好。然后,我们提出了一种似然方法,该方法从一类过度完成的因式分解中寻找最稀疏的潜在因式分解,可能具有较高百分比的缺失值。除了降低等级之外,这还促进了额外的稀疏性。在计算上,我们基于分解和组合策略设计方法,以将大规模优化分解为许多小子问题,以递归和并行的方式进行求解。在此基础上,我们通过多平台共享内存并行编程,并通过Mahout(用于可伸缩机器学习和数据挖掘的库,用于mapReduce计算)来实现所提出的方法。例如,我们的方法可扩展到一个数据集,该数据集由一台机器上的30亿个观测值组成,具有足够的内存并具有良好的时序。理论和数值研究均表明,与最新的可缩放方法相比,所提出的方法在准确性方面有显着提高。可在线获得本文的补充材料。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号