首页> 外文期刊>Computational statistics & data analysis >Model-based biclustering of clickstream data
【24h】

Model-based biclustering of clickstream data

机译:基于模型的点击流数据二元化

获取原文
获取原文并翻译 | 示例
           

摘要

Navigation patterns expressed by sequences of visited web-sites or categories can characterize the behavior and habits of users. Such web-page routes taken by individuals are commonly called clickstreams. Clustering clickstream sequences is a recent yet challenging problem with many applications. The main difficulty is related to the fact that one needs to group categorical data sequences rather than vectors and the majority of traditional clustering algorithms are not applicable in this setting. The time-related character of data suggests that dynamic models have a better promise than static ones. Model-based clustering relying on the mixture of first order Markov models will be considered. Since the number of distinct web-pages, and therefore the number of states in a Markov process, can be very high, such a mixture model involves a large number of parameters. Thus, grouping states by their similarity to reduce the number of parameters in the model is also proposed. Then, states are clustered along with users providing a biclustering framework. The developed methodology is illustrated on synthetic and real datasets with good results. (C) 2014 Elsevier B.V. All rights reserved.
机译:由访问过的网站或类别的序列表示的导航模式可以表征用户的行为和习惯。个人采用的此类网页路由通常称为点击流。对点击流序列进行聚类是许多应用程序中最近遇到的又一个难题。主要困难与以下事实有关:需要对分类数据序列而不是向量进行分组,并且大多数传统的聚类算法不适用于这种情况。与时间相关的数据特征表明,动态模型比静态模型具有更好的前景。将考虑依赖一阶马尔可夫模型混合的基于模型的聚类。由于不同网页的数量以及因此在马尔可夫过程中的状态的数量可能非常高,因此这种混合模型涉及大量参数。因此,还提出了根据状态的相似性对其进行分组以减少模型中参数数量的建议。然后,将状态与用户一起聚类,从而提供一个双聚类框架。所开发的方法在合成数据集和真实数据集上得到了很好的说明。 (C)2014 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号