首页> 外文会议>Knowledge Discovery and Data Mining, 2010. WKDD '10 >Web Objects Clustering Using Transaction Log
【24h】

Web Objects Clustering Using Transaction Log

机译:使用事务日志的Web对象集群

获取原文

摘要

In this paper, we present a novel method for clustering web objects. Most of existing methods aren't sufficient to explore similar objects, because the basic data, which include attributes of objects, click-through data, and link data, are often sparse, scarce or difficult to obtain. In contrast, the information we exploit is transaction log, which is more common, denser as well as noisier. To reduce the influence of the noises, we calculate the similarity in two steps. Firstly, we use a basic similarity to discover objects' neighbors. The objects are represented by vectors consisting of their neighbors. Secondly, the cosine similarity of the object vectors is calculated for clustering. Experiments on synthetic data show that our method is robust against noises. Using noisy data, we increase the precision by 10%. Finally, we show real clustering results based on a movie dataset and achieve the coverage of 76% and the precision of 60%.
机译:在本文中,我们提出了一种用于群集Web对象的新颖方法。现有的大多数方法还不足以探索相似的对象,因为包括对象属性,点击数据和链接数据在内的基本数据通常稀疏,稀缺或难以获取。相反,我们利用的信息是事务日志,它更常见,更密集,更嘈杂。为了减少噪声的影响,我们分两步计算相似度。首先,我们使用基本的相似性来发现对象的邻居。对象由包含它们的邻居的向量表示。其次,计算目标向量的余弦相似度以进行聚类。对合成数据进行的实验表明,我们的方法对噪声具有鲁棒性。使用嘈杂的数据,我们将精度提高了10%。最后,我们基于电影数据集显示真实的聚类结果,并实现了76%的覆盖率和60%的精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号