首页> 外文期刊>ACM Transactions on Information Systems >Inferring Dynamic User Interests in Streams of Short Texts for User Clustering
【24h】

Inferring Dynamic User Interests in Streams of Short Texts for User Clustering

机译:在用于用户聚类的短文本流中推断动态用户兴趣

获取原文
获取原文并翻译 | 示例

摘要

User clustering has been studied from different angles. In order to identify shared interests, behavior-based methods consider similar browsing or search patterns of users, whereas content-based methods use information from the contents of the documents visited by the users. So far, content-based user clustering has mostly focused on static sets of relatively long documents. Given the dynamic nature of social media, there is a need to dynamically cluster users in the context of streams of short texts. User clustering in this setting is more challenging than in the case of long documents, as it is difficult to capture the users' dynamic topic distributions in sparse data settings. To address this problem, we propose a dynamic user clustering topic model (UCT). UCT adaptively tracks changes of each user's time-varying topic distributions based both on the short texts the user posts during a given time period and on previously estimated distributions. To infer changes, we propose a Gibbs sampling algorithm where a set of word pairs from each user is constructed for sampling. UCT can be used in two ways: (1) as a short-term dependency model that infers a user's current topic distribution based on the user's topic distributions during the previous time period only, and (2) as a long-term dependency model that infers a user's current topic distributions based on the user's topic distributions during multiple time periods in the past. The clustering results are explainable and human-understandable, in contrast to many other clustering algorithms. For evaluation purposes, we work with a dataset consisting of users and tweets from each user. Experimental results demonstrate the effectiveness of our proposed short-term and long-term dependency user clustering models compared to state-of-the-art baselines.
机译:已经从不同角度研究了用户聚类。为了识别共享的兴趣,基于行为的方法考虑用户的相似浏览或搜索模式,而基于内容的方法使用来自用户访问的文档内容中的信息。到目前为止,基于内容的用户聚类主要集中在相对较长文档的静态集合上。考虑到社交媒体的动态性质,需要在短文本流的上下文中动态地将用户聚类。与使用长文档的情况相比,此设置中的用户聚类更具挑战性,因为很难在稀疏数据设置中捕获用户的动态主题分布。为了解决这个问题,我们提出了一个动态的用户集群主题模型(UCT)。 UCT根据用户在给定时间段内发布的短文本和以前估计的分布,自适应地跟踪每个用户的时变主题分布的变化。为了推断变化,我们提出了一种Gibbs采样算法,其中构造了来自每个用户的一组单词对以进行采样。 UCT可以通过两种方式使用:(1)作为短期依赖关系模型,该模型仅基于上一个时间段内用户的主题分布来推断用户当前的主题分布,以及(2)作为长期依赖关系模型,根据过去多个时间段内用户的主题分布来推断用户的当前主题分布。与许多其他聚类算法相比,聚类结果是可解释的并且是人类可以理解的。为了进行评估,我们使用由用户和每个用户的推文组成的数据集。实验结果证明了我们提出的短期和长期依赖性用户聚类模型与最新基准相比的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号