首页> 外文会议>WEBKDD International Workshop >Automatic Categorization of Web Pages and User Clustering with Mixtures of Hidden Markov Models
【24h】

Automatic Categorization of Web Pages and User Clustering with Mixtures of Hidden Markov Models

机译:Web页面和用户群集的自动分类与隐马尔可夫模型的混合

获取原文

摘要

We propose mixtures of hidden Markov models for modelling clickstreams of web surfers. Hence, the page categorization is learned from the data without the need for a (possibly cumbersome) manual categorization. We provide an EM algorithm for training a mixture of HMMs and show that additional static user data can be incorporated easily to possibly enhance the labelling of users. Furthermore, we use prior knowledge to enhance generalization and avoid numerical problems. We use parameter tying to decrease the danger of overfitting and to reduce computational overhead. We put a flat prior on the parameters to deal with the problem that certain transitions between page categories occur very seldom or not at all, in order to ensure that a nonzero transition probability between these categories nonetheless remains. In applications to artificial data and real-world web logs we demonstrate the usefulness of our approach. We train a mixture of HMMs on artificial navigation patterns, and show that the correct model is being learned. Moreover, we show that the use of static 'satellite data' may enhance the labeling of shorter navigation patterns. When applying a mixture of HMMs to real-world web logs from a large Dutch commercial web site, we demonstrate that sensible page categorizations are being learned.
机译:我们提出了隐藏马尔可夫模型的混合物,用于建模Web冲浪者的点击流。因此,在数据中汲取页面分类,而无需(可能会麻烦)手动分类。我们提供了一种用于训练HMMS混合的EM算法,并表明可以容易地结合额外的静态用户数据,以便可能增强用户的标签。此外,我们使用先验知识来增强泛化,避免数值问题。我们使用参数捆绑以减少过度装备的危险并减少计算开销。我们在参数上进行了一个平面,以解决页面类别之间某些转换的问题非常很少或根本不发生,以确保仍然存在这些类别之间的非零转换概率。在应用于人工数据和现实世界的网络日志中,我们展示了我们方法的有用性。我们在人工导航模式上训练HMMS的混合物,并表明正在学习正确的模型。此外,我们表明使用静态“卫星数据”可以增强较短的导航模式的标签。在从大型荷兰商业网站应用HMMS将HMMS混合到现实世界网络日志时,我们证明正在学习明智的页面分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号