首页> 外文期刊>Decision support systems >Detecting short-term cyclical topic dynamics in the user-generated content and news
【24h】

Detecting short-term cyclical topic dynamics in the user-generated content and news

机译:检测用户生成的内容和新闻中的短期周期性主题动态

获取原文
获取原文并翻译 | 示例
       

摘要

With the maturation of the Internet and the mobile technology, Internet users are now able to produce and consume text data in different contexts. Linking the context to the text data can provide valuable information regarding users' activities and preferences, which are useful for decision support tasks such as market segmentation and product recommendation. To this end, previous studies have proposed to incorporate into topic models contextual information such as authors' identities and timestamps. Despite recent efforts to incorporate contextual information, few studies have focused on the short-term cyclical topic dynamics that connect the changes in topic occurrences to the time of day, the day of the week, and the day of the month. Short-term cyclical topic dynamics can both characterize the typical contexts to which a user is exposed at different occasions and identify user habits in specific contexts. Both abilities are essential for decision support tasks that are context dependent. To address this challenge, we present the Probit-Dirichlet hybrid allocation (PDHA) topic model, which incorporates a document's temporal features to capture a topic's short-term cyclical dynamics. A document's temporal features enter the topic model through the regression covariates of a multinomial-Probit-like structure that influences the prior topic distribution of individual tokens. By incorporating temporal features for monthly, weekly, and daily cyclical dynamics, PDHA is able to capture interesting short-term cyclical patterns that characterize topic dynamics. We developed an augmented Gibbs sampling algorithm for the non-Dirichlet-conjugate setting in PDHA. We then demonstrated the utility of PDHA using text collections from user generated content, newswires, and newspapers. Our experiments show that PDHA achieves higher hold-out likelihood values compared to baseline models, including latent Dirichlet allocation (LDA) and Dirichlet-multinomial regression (DMR). The temporal features for short-term cyclical dynamics and the novel model structure of PDHA both contribute to this performance advantage. The results suggest that PDHA is an attractive approach for decision support tasks involving text mining.
机译:随着Internet和移动技术的成熟,Internet用户现在能够在不同的上下文中生成和使用文本数据。将上下文链接到文本数据可以提供有关用户活动和偏好的有价值的信息,这对于诸如市场细分和产品推荐之类的决策支持任务很有用。为此,以前的研究提出将上下文信息(例如作者的身份和时间戳)纳入主题模型。尽管最近为整合上下文信息做出了努力,但很少有研究集中在短期周期性主题动态方面,这些主题动态将主题发生的变化与一天中的时间,一周中的某天以及每月的某天联系起来。短期周期性主题动态既可以表征用户在不同情况下所处的典型上下文,又可以识别特定上下文中的用户习惯。这两种能力对于依赖于上下文的决策支持任务都是必不可少的。为了解决这一挑战,我们提出了Probit-Dirichlet混合分配(PDHA)主题模型,该模型合并了文档的时间特征以捕获主题的短期周期性动态。文档的时间特征通过多项式类似Probit的结构的回归协变量进入主题模型,该结构影响各个标记的先前主题分布。通过合并每月,每周和每日周期性动态的时间特征,PDHA能够捕获表征主题动态的有趣的短期周期性模式。我们为PDHA中的非Dirichlet共轭设置开发了增强的Gibbs采样算法。然后,我们使用来自用户生成的内容,新闻通讯社和报纸的文本集展示了PDHA的实用性。我们的实验表明,PDHA与基线模型(包括潜在Dirichlet分配(LDA)和Dirichlet-多项式回归(DMR))相比,具有更高的保留可能性值。短期周期性动力学的时间特征和PDHA的新颖模型结构都有助于实现这一性能优势。结果表明,PDHA是涉及文本挖掘的决策支持任务的一种有吸引力的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号