首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Won't somebody please think of the children? Improving Topic Model Clustering of Newspaper Comments for Summarisation
【24h】

Won't somebody please think of the children? Improving Topic Model Clustering of Newspaper Comments for Summarisation

机译:有人会想起孩子吗?改进报纸评论的主题模型聚类以进行总结

获取原文

摘要

Online newspaper articles can accumulate comments at volumes that prevent close reading. Summarisation of the comments allows interaction at a higher level and can lead to an understanding of the overall discussion. Comment summarisation requires topic clustering, comment ranking and extraction. Clustering must be robust as the subsequent extraction relies on a good set of clusters. Comment data, as with many social media datasets, contains very short documents and the number of words in the documents is a limiting factors on the performance of LDA clustering. We evaluate whether we can combine comments to form larger documents to improve the quality of clusters. We find that combining comments with comments that reply to them produce the highest quality clusters.
机译:在线报纸上的文章可能会积累大量评论,以至于无法仔细阅读。评论的摘要可以在更高层次上进行交互,并且可以使您对整个讨论有所了解。评论摘要需要主题聚类,评论排名和提取。由于后续提取依赖于一组良好的群集,因此群集必须具有鲁棒性。与许多社交媒体数据集一样,评论数据包含非常短的文档,并且文档中的单词数是LDA聚类性能的限制因素。我们评估是否可以合并注释以形成更大的文档以提高群集的质量。我们发现,将评论与回复他们的评论相结合可以产生最高质量的集群。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号