首页> 外文期刊>Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on >Utilizing Different Link Types to Enhance Document Clustering Based on Markov Random Field Model With Relaxation Labeling
【24h】

Utilizing Different Link Types to Enhance Document Clustering Based on Markov Random Field Model With Relaxation Labeling

机译:基于带松弛标记的马尔可夫随机域模型,利用不同的链接类型增强文档聚类

获取原文
获取原文并翻译 | 示例

摘要

With the fast growing number of works utilizing link information in enhancing unsupervised document clustering, it is becoming necessary to make a comparative evaluation of the impacts of different link types on document clustering. Various types of links between text documents, including explicit links such as citation links and hyperlinks, implicit links such as coauthorship and cocitation links, and similarity links such as content similarity links, convey topic similarity or topic transferring patterns, which is very useful for document clustering. In this paper, we adopt a clustering algorithm based on Markov random field and relaxation labeling, which employs both content and linkage information, to evaluate the effectiveness of the aforementioned types of links for document clustering on ten data sets. The experimental results show that linkage information is quite effective in improving content-based document clustering. Furthermore, a series of important findings regarding the impacts of different link types on document clustering is discovered through our experiments.
机译:随着利用链接信息来增强无监督文档聚类的作品数量迅速增加,有必要对不同链接类型对文档聚类的影响进行比较评估。文本文档之间的各种类型的链接,包括诸如引用链接和超链接之类的显式链接,诸如共同作者和引用链接之类的隐式链接以及诸如内容相似性链接之类的相似性链接,传达主题相似性或主题传递模式,这对于文档非常有用聚类。在本文中,我们采用基于马尔可夫随机场和松弛标记的聚类算法,该算法同时使用内容和链接信息,以评估上述类型的链接对十个数据集进行文档聚类的有效性。实验结果表明,链接信息在改进基于内容的文档聚类方面非常有效。此外,通过我们的实验发现了一系列有关不同链接类型对文档聚类的影响的重要发现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号