fLDA: Matrix Factorization through Latent Dirichlet Allocation

机译：fLDA：通过潜在狄利克雷分配的矩阵分解

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose fLDA, a novel matrix factorization method to predict ratings in recommender system applications where a "bag-of-words" representation for item meta-data is natural. Such scenarios are commonplace in web applications like content recommendation, ad targeting and web search where items are articles, ads and web pages respectively. Because of data sparseness, regularization is key to good predictive accuracy. Our method works by regularizing both user and item factors simultaneously through user features and the bag of words associated with each item. Specifically, each word in an item is associated with a discrete latent factor often referred to as the topic of the word; item topics are obtained by averaging topics across all words in an item. Then, user rating on an item is modeled as user's affinity to the item's topics where user affinity to topics (user factors) and topic assignments to words in items (item factors) are learned jointly in a supervised fashion. To avoid overfitting, user and item factors are regularized through Gaussian linear regression and Latent Dirichlet Allocation (LDA) priors respectively. We show our model is accurate, interpretable and handles both cold-start and warm-start scenarios seamlessly through a single model. The efficacy of our method is illustrated on benchmark datasets and a new dataset from Yahoo! Buzz where fLDA provides superior predictive accuracy in cold-start scenarios and is comparable to state-of-the-art methods in warm-start scenarios. As a by-product, fLDA also identifies interesting topics that explains user-item interactions. Our method also generalizes a recently proposed technique called supervised LDA (sLDA) to collaborative filtering applications. While sLDA estimates item topic vectors in a supervised fashion for a single regression, fLDA incorporates multiple regressions (one for each user) in estimating the item factors.

机译：我们提出fLDA，这是一种新颖的矩阵分解方法，可在推荐系统应用中预测评分，在推荐系统应用中，项目元数据的“词袋”表示很自然。这样的场景在内容推荐，广告定位和网络搜索等Web应用程序中很常见，其中项目分别是文章，广告和网页。由于数据稀疏，正则化是获得良好预测准确性的关键。我们的方法通过用户特征和与每个项目相关的词袋同时规范化用户和项目因素来工作。具体来说，项目中的每个单词都与一个离散的潜在因子相关联，该潜在因子通常称为单词的主题;通过对项目中所有单词的主题进行平均来获得项目主题。然后，将对项目的用户评分建模为用户对项目主题的亲和力，其中以监督方式联合学习对主题的用户亲和力（用户因素）和对项目中单词的主题分配（项因素）。为了避免过度拟合，分别通过高斯线性回归和潜在狄利克雷分配（LDA）先验对用户和项目因子进行正则化。我们证明了我们的模型是准确的，可解释的，并且可以通过一个模型无缝处理冷启动和热启动场景。在基准数据集和Yahoo!的新数据集上说明了我们方法的有效性。 fLDA在冷启动场景中具有卓越的预测准确性，可与热启动场景中的最新方法相媲美的嗡嗡声。作为副产品，fLDA还确定了有趣的主题，这些主题解释了用户与项目的交互。我们的方法还将最近提出的称为监督LDA（sLDA）的技术推广到协作过滤应用程序。尽管sLDA以一种有监督的方式估算商品主题向量以进行一次回归，但fLDA在估算商品因子时会纳入多个回归（每个用户一个）。

著录项

来源
《3rd ACM international conference on web search and data mining 2010》|2010年|P.91-100|共10页
会议地点
作者
Deepak Agarwal; Bee-Chung Chen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机网络;
关键词
algorithms; design; experimentation;

机译：算法;设计;实验;

相似文献

外文文献
中文文献
专利

1. Email thread identification using latent Dirichlet allocation and non-negative matrix factorization based clustering techniques [J] . Aakanksha Sharaff, Naresh Kumar Nagwani Journal of Information Science . 2016,第2期

机译：使用潜在Dirichlet分配和基于非负矩阵分解的聚类技术进行电子邮件线程识别
2. FLDA: Latent Dirichlet Allocation Based Unsteady Flow Analysis [J] . Hong F., Lai C., Guo H., Visualization and Computer Graphics, IEEE Transactions on . 2014,第12期

机译：FLDA：基于潜在狄利克雷分配的非恒定流分析
3. Exploring Symmetrical and Asymmetrical Dirichlet Priors for Latent Dirichlet Allocation [J] . Shaheen Syed, Marco Spruit International journal of semantic computing . 2018,第3期

机译：探索对称和不对称的Dirichlet Priors潜在的Dirichlet分配
4. A comparison of the performance of latent Dirichlet allocation and the Dirichlet multinomial mixture model on short text [C] . Jocelyn Mazarura, Alta de Waal International Conference on Pattern Recognition Association of South Africa and Robotics and Mechatronics . 2016

机译：短文本上潜在的Dirichlet分配和Dirichlet多项式混合模型的性能比较
5. Comparing latent Dirichlet allocation and latent semantic analysis as classifiers [D] . Anaya, Leticia H. 2011

机译：比较潜在Dirichlet分配和潜在语义分析作为分类器
6. Latent Dirichlet allocation model for world trade analysis [O] . Diego Kozlowski, Viktoriya Semeshenko, Andrea Molinari 2021

机译：世界贸易分析潜在的Dirichlet分配模型
7. Analysis of latent Dirichlet allocation and non-negative matrix factorization using latent semantic indexing [O] . Saqib et al. 2019

机译：利用潜在语义索引分析潜在的Dirichlet分配和非负矩阵分解

fLDA: Matrix Factorization through Latent Dirichlet Allocation

摘要

著录项

相似文献

相关主题

期刊订阅