首页> 外文会议>IEEE International Conference on Control Science and Systems Engineering >An improved pagerank algorithm based on fuzzy C-means clustering and information entropy
【24h】

An improved pagerank algorithm based on fuzzy C-means clustering and information entropy

机译:一种改进的基于模糊C型聚类和信息熵的PageRank算法

获取原文

摘要

This paper proposes an improvement to the PageRank algorithm. Most existing PageRank algorithms expect a strong correlation among consecutively accessed webpages, which in reality should be a fuzzy relationship when a user accesses pages on an arbitrarily basis. We mine data from search-behavior logs by analyzing chronological sequential patterns, and cluster all webpages using fuzzy C clustering. The weight of each cluster is identified with information entropy, which is then used to adjust the average weight. A sample of 1 million pages is used for testing. Compared with traditional PageRank, the new algorithm decreases search time by 34.83% and increases search accuracy by 41.88%; when compared with HITS, the improvements are 31.82% and 64.04% respectively.
机译:本文提出了对PageRank算法的改进。大多数现有的PageRank算法预计连续访问的网页之间的相关性强烈关联,当用户基于用户访问页面时,现实应该是模糊的关系。通过分析时间顺序模式,通过分析时间顺序模式,从搜索行为日志中挖掘数据,并使用模糊C群集群集所有网页。使用信息熵识别每个簇的权重,然后使用该信息熵,然后用于调整平均重量。 100万页的样本用于测试。与传统的PageRank相比,新算法将搜索时间减少34.83 %,并将搜索精度提高41.88 %;与点击相比,改善分别为31.82 %和64.04 %。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号