首页> 外文期刊>Information Processing & Management >A probabilistic relational approach for web document clustering
【24h】

A probabilistic relational approach for web document clustering

机译:Web文档聚类的概率关系方法

获取原文
获取原文并翻译 | 示例
           

摘要

The exponential growth of information available on the World Wide Web, and retrievable by search engines, has implied the necessity to develop efficient and effective methods for organizing relevant contents. In this field document clustering plays an important role and remains an interesting and challenging problem in the field of web computing. In this paper we present a document clustering method, which takes into account both contents information and hyperlink structure of web page collection, where a document is viewed as a set of semantic units. We exploit this representation to determine the strength of a relation between two linked pages and to define a relational clustering algorithm based on a probabilistic graph representation. The experimental results show that the proposed approach, called RED-clustering, outperforms two of the most well known clustering algorithm as k-Means and Expectation Maximization.
机译:万维网上可用的信息呈指数增长,并且可以被搜索引擎检索到,这意味着必须开发有效且有效的方法来组织相关内容。在这个领域中,文档聚类起着重要的作用,并且仍然是Web计算领域中一个有趣且具有挑战性的问题。在本文中,我们提出了一种文档聚类方法,该方法同时考虑了内容信息和网页集合的超链接结构,其中文档被视为一组语义单元。我们利用这种表示法来确定两个链接页面之间的关系强度,并基于概率图表示法定义关系聚类算法。实验结果表明,所提出的称为RED聚类的方法优于两个最著名的聚类算法k-Means和Expectation Maximization。

著录项

  • 来源
    《Information Processing & Management》 |2010年第2期|117-130|共14页
  • 作者单位

    Dipartimento di Informatica Sistemistica e Comunicazione, Universita degli Studi di Milano-Bicocca, Italy;

    rnDipartimento di Informatica Sistemistica e Comunicazione, Universita degli Studi di Milano-Bicocca, Italy;

    Dipartimento di Informatica Sistemistica e Comunicazione, Universita degli Studi di Milano-Bicocca, Italy Consorzio Milano Ricerche, Via Cozzi 53, 20126 Milano, Italy;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    relational document clustering; relational web structure estimation;

    机译:关系文档聚类;关系网络结构估计;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号