【24h】

W-kmeans: Clustering News Articles Using WordNet

机译:W-kmeans:使用WordNet聚集新闻文章

获取原文

摘要

Document clustering is a powerful technique that has been widely used for organizing data into smaller and manageable information kernels. Several approaches have been proposed suffering however from problems like synonymy, ambiguity and lack of a descriptive content marking of the generated clusters. We are proposing the enhancement of standard kmeans algorithm using the external knowledge from WordNet hypernyms in a twofold manner: enriching the "bag of words" used prior to the clustering process and assisting the label generation procedure following it. Our experimentation revealed a significant improvement over standard kmeans for a corpus of news articles derived from major news portals. Moreover, the cluster labeling process generates useful and of high quality cluster tags.
机译:文档集群是一项功能强大的技术,已广泛用于将数据组织到较小且可管理的信息内核中。然而,已经提出了几种方法,这些方法存在诸如同义词,歧义性以及缺少所生成簇的描述性内容标记之类的问题。我们建议使用WordNet上位词的外部知识以双重方式增强标准kmeans算法:丰富在聚类过程之前使用的“单词袋”,并协助其后的标签生成过程。我们的实验显示,对于来自主要新闻门户的新闻文章语料库,与标准kmeans相比有显着改进。此外,集群标记过程会生成有用的高质量集群标签。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号