首页> 外文会议>European Conference on Information Retrieval Research >A Graph-Based Approach to Topic Clustering for Online Comments to News
【24h】

A Graph-Based Approach to Topic Clustering for Online Comments to News

机译:基于图形群集的基于图形的方法,以便在线评论新闻

获取原文

摘要

This paper investigates graph-based approaches to labeled topic clustering of reader comments in online news. For graph-based clustering we propose a linear regression model of similarity between the graph nodes (comments) based on similarity features and weights trained using automatically derived training data. To label the clusters our graph-based approach makes use of DBPedia to abstract topics extracted from the clusters. We evaluate the clustering approach against gold standard data created by human annotators and compare its results against LDA - currently reported as the best method for the news comment clustering task. Evaluation of cluster labelling is set up as a retrieval task, where human annotators are asked to identify the best cluster given a cluster label. Our clustering approach significantly outperforms the LDA baseline and our evaluation of abstract cluster labels shows that graph-based approaches are a promising method of creating labeled clusters of news comments, although we still find cases where the automatically generated abstractive labels are insufficient to allow humans to correctly associate a label with its cluster.
机译:本文调查了基于图形的读者评论中标记为主题聚类的方法。对于基于图形的聚类,我们提出了基于使用自动导出的训练数据训练的相似性特征和权重之间的图形节点(评论)之间的相似性的线性回归模型。要标记群集我们的基于图形的方法,利用DBPedia从集群中提取的抽象主题。我们评估人类注册器创建的金标准数据的聚类方法,并将其对LDA的结果进行比较 - 目前报告为新闻注释聚类任务的最佳方法。群集标签的评估被设置为检索任务,其中要求人类注释器识别给定群集标签的最佳群集。我们的聚类方法显着优于LDA基线,我们对抽象群集标签的评估表明,基于图形的方法是创建标记的新闻评论集群的有希望的方法,尽管我们仍然发现自动产生的抽象标签不足以让人类允许人类将标签与其群集正确相关联。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号