首页> 外文会议>Chinese lexical semantics workshop >Clustering of News Topics Integrating the Relationship among News Elements
【24h】

Clustering of News Topics Integrating the Relationship among News Elements

机译:新闻主题聚类,整合新闻元素之间的关系

获取原文

摘要

To make full use of news document structure and the relation among different news documents, a news topic clustering method is proposed of using the relation among document elements. First, the word characteristic weight was calculated by the TF-IDF method based on word frequency statistics to generate document space vector and news document similarity was calculated using text similarity measurement algorithm to obtain the initial news document similarity matrix. Then, the initial similarity matrix was modified with the relation among different news elements as semi-supervised constraint information, the clustering of news documents was realized using Affinity Propagation algorithm, and news topics were extracted from news clusters. As a result, the construction of news topic model was finished. At last, the contrast experiments were performed on manually-annotated news corpus. The results show that the Affinity Propagation clustering methods integrating the relation among document elements can achieve a better effect than those without constraint information.
机译:为了充分利用新闻文档的结构和不同新闻文档之间的关系,提出了一种利用文档要素之间的关系进行新闻话题聚类的方法。首先,基于词频统计,通过TF-IDF方法计算词的特征权重,生成文档空间矢量,并使用文本相似度度量算法计算新闻文档的相似度,得到初始新闻文档的相似度矩阵。然后,以不同新闻元素之间的关系作为半监督约束信息,对初始相似度矩阵进行了修改,并利用亲和传播算法实现了新闻文档的聚类,并从新闻簇中提取了新闻主题。结果,新闻主题模型的构建完成了。最后,在人工注释新闻语料库上进行了对比实验。结果表明,与没有约束信息的方法相比,融合文档元素之间关系的亲和力传播聚类方法可以获得更好的效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号