首页> 外文会议>3rd ACM international conference on web search and data mining 2010 >fLDA: Matrix Factorization through Latent Dirichlet Allocation
【24h】

fLDA: Matrix Factorization through Latent Dirichlet Allocation

机译:fLDA:通过潜在狄利克雷分配的矩阵分解

获取原文

摘要

We propose fLDA, a novel matrix factorization method to predict ratings in recommender system applications where a "bag-of-words" representation for item meta-data is natural. Such scenarios are commonplace in web applications like content recommendation, ad targeting and web search where items are articles, ads and web pages respectively. Because of data sparseness, regularization is key to good predictive accuracy. Our method works by regularizing both user and item factors simultaneously through user features and the bag of words associated with each item. Specifically, each word in an item is associated with a discrete latent factor often referred to as the topic of the word; item topics are obtained by averaging topics across all words in an item. Then, user rating on an item is modeled as user's affinity to the item's topics where user affinity to topics (user factors) and topic assignments to words in items (item factors) are learned jointly in a supervised fashion. To avoid overfitting, user and item factors are regularized through Gaussian linear regression and Latent Dirichlet Allocation (LDA) priors respectively. We show our model is accurate, interpretable and handles both cold-start and warm-start scenarios seamlessly through a single model. The efficacy of our method is illustrated on benchmark datasets and a new dataset from Yahoo! Buzz where fLDA provides superior predictive accuracy in cold-start scenarios and is comparable to state-of-the-art methods in warm-start scenarios. As a by-product, fLDA also identifies interesting topics that explains user-item interactions. Our method also generalizes a recently proposed technique called supervised LDA (sLDA) to collaborative filtering applications. While sLDA estimates item topic vectors in a supervised fashion for a single regression, fLDA incorporates multiple regressions (one for each user) in estimating the item factors.
机译:我们提出fLDA,这是一种新颖的矩阵分解方法,可在推荐系统应用中预测评分,在推荐系统应用中,项目元数据的“词袋”表示很自然。这样的场景在内容推荐,广告定位和网络搜索等Web应用程序中很常见,其中项目分别是文章,广告和网页。由于数据稀疏,正则化是获得良好预测准确性的关键。我们的方法通过用户特征和与每个项目相关的词袋同时规范化用户和项目因素来工作。具体来说,项目中的每个单词都与一个离散的潜在因子相关联,该潜在因子通常称为单词的主题;通过对项目中所有单词的主题进行平均来获得项目主题。然后,将对项目的用户评分建模为用户对项目主题的亲和力,其中以监督方式联合学习对主题的用户亲和力(用户因素)和对项目中单词的主题分配(项因素)。为了避免过度拟合,分别通过高斯线性回归和潜在狄利克雷分配(LDA)先验对用户和项目因子进行正则化。我们证明了我们的模型是准确的,可解释的,并且可以通过一个模型无缝处理冷启动和热启动场景。在基准数据集和Yahoo!的新数据集上说明了我们方法的有效性。 fLDA在冷启动场景中具有卓越的预测准确性,可与热启动场景中的最新方法相媲美的嗡嗡声。作为副产品,fLDA还确定了有趣的主题,这些主题解释了用户与项目的交互。我们的方法还将最近提出的称为监督LDA(sLDA)的技术推广到协作过滤应用程序。尽管sLDA以一种有监督的方式估算商品主题向量以进行一次回归,但fLDA在估算商品因子时会纳入多个回归(每个用户一个)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号