首页> 外文会议>International Conference on Pattern Recognition Applications and Methods >An Episode-based Approach to Identify Website User Access Patterns
【24h】

An Episode-based Approach to Identify Website User Access Patterns

机译:基于剧集的方法来识别网站用户访问模式

获取原文

摘要

Mining web access log data is a popular technique to identify frequent access patterns of website users. There are many mining techniques such as clustering, sequential pattern mining and association rule mining to identify these frequent access patterns. Each can find interesting access patterns and group the users, but they cannot identify the slight differences between accesses patterns included in individual clusters. But in reality these could refer to important information about attacks. This paper introduces a methodology to identify these access patterns at a much lower level than what is provided by traditional clustering techniques, such as nearest neighbour based techniques and classification techniques. This technique makes use of the concept of episodes to represent web sessions. These episodes are expressed in the form of regular expressions. To the best of our knowledge, this is the first time to apply the concept of regular expressions to identify user access patterns in web server log data. In addition to identifying frequent patterns, we demonstrate that this technique is able to identify access patterns that occur rarely, which would have been simply treated as noise in traditional clustering mechanisms.
机译:挖掘Web访问日志数据是一种流行的技术,可以识别网站用户的频繁访问模式。有许多挖掘技术,如聚类,顺序模式挖掘和关联规则挖掘,以识别这些频繁访问模式。每个都可以找到有趣的访问模式和分组用户,但它们无法识别包括在单个群集中的访问模式之间的略微差异。但实际上这些可以参考关于攻击的重要信息。本文介绍了一种方法来识别这些访问模式,其比传统聚类技术所提供的更低水平,例如最近的基于邻的技术和分类技术。这种技术利用剧集的概念来代表网络会话。这些剧集以正则表达式的形式表示。据我们所知,这是第一次应用正则表达式的概念来识别Web服务器日志数据中的用户访问模式。除了识别频繁的模式之外,我们证明该技术能够识别很少发生的访问模式,这将被简单地被视为传统聚类机制中的噪声。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号