首页> 外文会议>European conference on information retrieval research >A Graph-Based Approach to Topic Clustering for Online Comments to News
【24h】

A Graph-Based Approach to Topic Clustering for Online Comments to News

机译:基于图的在线新闻评论主题聚类方法

获取原文

摘要

This paper investigates graph-based approaches to labeled topic clustering of reader comments in online news. For graph-based clustering we propose a linear regression model of similarity between the graph nodes (comments) based on similarity features and weights trained using automatically derived training data. To label the clusters our graph-based approach makes use of DBPedia to abstract topics extracted from the clusters. We evaluate the clustering approach against gold standard data created by human annotators and compare its results against LDA - currently reported as the best method for the news comment clustering task. Evaluation of cluster labelling is set up as a retrieval task, where human annotators are asked to identify the best cluster given a cluster label. Our clustering approach significantly outperforms the LDA baseline and our evaluation of abstract cluster labels shows that graph-based approaches are a promising method of creating labeled clusters of news comments, although we still find cases where the automatically generated abstractive labels are insufficient to allow humans to correctly associate a label with its cluster.
机译:本文研究了基于图的方法来标记在线新闻中读者评论的主题聚类。对于基于图的聚类,我们基于使用自动派生的训练数据训练的相似度特征和权重,提出了图节点(注释)之间相似度的线性回归模型。为了标记集群,我们基于图的方法利用DBPedia提取从集群中提取的主题。我们根据人工注释者创建的金标准数据评估聚类方法,并将其结果与LDA(目前报道为新闻评论聚类任务的最佳方法)进行比较。将群集标签的评估设置为一项检索任务,其中要求人工注释者在给定群集标签的情况下识别最佳群集。我们的聚类方法明显优于LDA基线,我们对抽象类标签的评估表明,基于图的方法是创建带有标签的新闻评论类的有前途的方法,尽管我们仍然发现自动生成的抽象标签不足以使人们进行分类的情况。正确地将标签与其群集相关联。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号