首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Topic Mining over Asynchronous Text Sequences
【24h】

Topic Mining over Asynchronous Text Sequences

机译:异步文本序列上的主题挖掘

获取原文
获取原文并翻译 | 示例

摘要

Time stamped texts, or text sequences, are ubiquitous in real-world applications. Multiple text sequences are often related to each other by sharing common topics. The correlation among these sequences provides more meaningful and comprehensive clues for topic mining than those from each individual sequence. However, it is nontrivial to explore the correlation with the existence of asynchronism among multiple sequences, i.e., documents from different sequences about the same topic may have different time stamps. In this paper, we formally address this problem and put forward a novel algorithm based on the generative topic model. Our algorithm consists of two alternate steps: the first step extracts common topics from multiple sequences based on the adjusted time stamps provided by the second step; the second step adjusts the time stamps of the documents according to the time distribution of the topics discovered by the first step. We perform these two steps alternately and after iterations a monotonic convergence of our objective function can be guaranteed. The effectiveness and advantage of our approach were justified through extensive empirical studies on two real data sets consisting of six research paper repositories and two news article feeds, respectively.
机译:时间戳文本或文本序列在实际应用中无处不在。多个文本序列通常通过共享公共主题相互关联。这些序列之间的相关性比每个单独序列中的相关性为主题挖掘提供了更有意义和更全面的线索。但是,探索与多个序列之间存在异步性的相关性是很重要的,即来自同一主题的不同序列的文档可能具有不同的时间戳。在本文中,我们正式解决了这一问题,并提出了一种基于生成主题模型的新算法。我们的算法包括两个替代步骤:第一步是根据第二步提供的调整后的时间戳从多个序列中提取共同主题;第二步根据第一步发现的主题的时间分布来调整文档的时间戳。我们交替执行这两个步骤,并且在迭代之后可以确保目标函数的单调收敛。通过对包括六个研究论文资料库和两个新闻提要的两个真实数据集进行广泛的实证研究,证明了我们方法的有效性和优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号