首页> 外文会议>SIAM International Conference on Data Mining >On Using Page Cooccurrences for Computing Clickstream Similarity
【24h】

On Using Page Cooccurrences for Computing Clickstream Similarity

机译:使用页面CoIcCurrenct来计算点击流相似性

获取原文

摘要

Clickstream analysis provides valuable insight into the behavior of users and can be translated into better business opportunities and increased user satisfaction. A fundamental problem in clickstream analysis is the computation of the distance (or the similarity) between two clickstreams. While, there exists a considerable amount of literature which propose methods of computing path similarities, they rely on the edit distance or the related longest common subsequence to align the two clickstreams. The edit distance provides a least cost sequence of transformations that result in the two clickstreams to be identical. Often, measures of path similarity are defined on these "aligned" clickstreams. However, the replacement cost used in the "alignment" process used by the edit distance is assumed to be fixed and ignores the degree of similarity of the two page views. Proposed in this paper is a method for computing the replacement cost that is based on the assumption that the degree of similarity between two page views is proportional to their relative frequency of cooccurrence. We de ne a method, which includes the order of the sequence as well as the time spent on each page, for obtaining the replacement cost of two arbitrary web pages. Though less accurate than content based analysis, our experiments with data generated from a simulator as well as data from an actual web site show that our assumption is well founded and that the proposed method provides a fast and accurate method of computing the similarity between two page views.
机译:Clickstream Analysis提供有价值的洞察力对用户的行为,并且可以转化为更好的商机并提高用户满意度。 Clickstream分析中的一个基本问题是计算两个单击流之间的距离(或相似性)。虽然,存在相当数量的文献,该文献提出了计算路径相似性的方法,它们依赖于编辑距离或相关的最长常见的子序列来对齐两个单击流。编辑距离提供最小的转换序列,导致两个单击流是相同的。通常,在这些“对齐”的单击流中定义了路径相似度的措施。然而,假设编辑距离使用的“对齐”过程中使用的更换成本是固定的并且忽略两个页面视图的相似度。本文提出的是用于计算基于假设的替换成本的方法,即两个页面视图之间的相似程度与其相对频率的共同频率成比例。我们是一种方法,包括序列的顺序以及在每个页面上花费的时间,用于获得两个任意网页的替换成本。虽然比基于内容的分析更低,但我们的实验与模拟器生成的数据以及来自实际网站的数据表明我们的假设得到了很好的成立,并且所提出的方法提供了一种快速准确的方法来计算两页之间的相似性意见。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号