【24h】

Detecting hot topics in technology news streams

机译:在技​​术新闻流中检测热门话题

获取原文

摘要

Detecting hot topics with a fine granularity in technology news streams is an interesting and important problem given the large amount of reports and a relatively narrow range of topics. In this paper, a three-phase method is proposed. In the first phase, the document topic distribution vector is generated and keywords are extracted for each document using topic model pachinko allocation. In the second phase, the documents are clustered based on the document topic distribution vector obtained from the previous phase using affinity propagation. And in the last phase, actual events denoted by combinations of keywords within each cluster are found out using frequent pattern mining algorithms. We evaluate our approach on a collection of technology news reports from various sites in a fixed time period. T he results show that this method is effective.
机译:鉴于大量的新闻报道和相对狭窄的主题范围,以技术新闻流中的精细粒度检测热门主题是一个有趣且重要的问题。本文提出了一种三相方法。在第一阶段,将生成文档主题分布向量,并使用主题模型pachinko分配为每个文档提取关键字。在第二阶段,基于从上一阶段使用相似性传播获得的文档主题分布向量对文档进行聚类。在最后一个阶段,使用频繁的模式挖掘算法找出由每个群集内的关键字组合表示的实际事件。我们在固定的时间段内对来自各个站点的技术新闻报道进行评估,以评估我们的方法。结果表明该方法是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号