首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Mining User-Aware Rare Sequential Topic Patterns in Document Streams
【24h】

Mining User-Aware Rare Sequential Topic Patterns in Document Streams

机译:在文档流中挖掘用户感知的稀有顺序主题模式

获取原文
获取原文并翻译 | 示例

摘要

Textual documents created and distributed on the Internet are ever changing in various forms. Most of existing works are devoted to topic modeling and the evolution of individual topics, while sequential relations of topics in successive documents published by a specific user are ignored. In this paper, in order to characterize and detect personalized and abnormal behaviors of Internet users, we propose Sequential Topic Patterns (STPs) and formulate the problem of mining User-aware Rare Sequential Topic Patterns (URSTPs) in document streams on the Internet. They are rare on the whole but relatively frequent for specific users, so can be applied in many real-life scenarios, such as real-time monitoring on abnormal user behaviors. We present a group of algorithms to solve this innovative mining problem through three phases: preprocessing to extract probabilistic topics and identify sessions for different users, generating all the STP candidates with (expected) support values for each user by pattern-growth, and selecting URSTPs by making user-aware rarity analysis on derived STPs. Experiments on both real (Twitter) and synthetic datasets show that our approach can indeed discover special users and interpretable URSTPs effectively and efficiently, which significantly reflect users’ characteristics.
机译:在Internet上创建和分发的文本文档正以各种形式发生变化。现有的大多数作品都致力于主题建模和单个主题的演变,而忽略了特定用户发布的连续文档中主题的顺序关系。在本文中,为了表征和检测Internet用户的个性化和异常行为,我们提出了顺序主题模式(STP),并提出了在Internet文档流中挖掘用户感知的稀有顺序主题模式(URSTP)的问题。它们总体上很少见,但对于特定用户而言相对频繁,因此可以应用于许多实际场景中,例如对异常用户行为的实时监控。我们提出了一组算法,可以通过三个阶段来解决这一创新性挖掘问题:预处理以提取概率性主题并为不同用户标识会话,通过模式增长为每个用户生成具有(预期)支持值的所有STP候选对象,以及选择URSTP通过对派生的STP进行用户感知的稀有性分析。在真实(Twitter)数据集和综合数据集上的实验表明,我们的方法确实可以有效,高效地发现特殊用户和可解释的URSTP,从而显着反映用户的特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号