首页> 外文会议>2011 IEEE Recent Advances in Intelligent Computational Systems >A Fuzzy Set Theoretic approach to discover user sessions from web navigational data
【24h】

A Fuzzy Set Theoretic approach to discover user sessions from web navigational data

机译:一种从网络导航数据发现用户会话的模糊集理论方法

获取原文

摘要

Due to the continuous increase in growth and complexity of WWW, web site publishers are facing increasing difficulty in attracting and retaining users. In order to design attractive web sites, designers must understand their users'' needs. Therefore analysing navigational behaviour of users is an important part of web page design. Web Usage Mining (WUM) is the application of data mining techniques to web usage data in order to discover the patterns that can be used to analyse the user''s navigational behaviour. Preprocessing, knowledge extraction and results analysis are the three main steps of WUM. Due to large amount of irrelevant information present in the web logs, the original log file can not be directly used in the WUM process. During the preprocessing stage of WUM raw web log data is to transformed into a set of user profiles. Each user profile captures a set of URLs representing a user session. This sessionized data can be used as the input for a variety of data mining tasks such as clustering, association rule mining, sequence mining etc. If the data mining task at hand is clustering, the session files are filtered to remove very small sessions in order to eliminate the noise from the data. But direct removal of these small sized sessions may result in loss of a significant amount of information specially when the number of small sessions is large. We propose a “Fuzzy Set Theoretic” approach to deal with this problem. Instead of directly removing all the small sessions below a specified threshold, we assign weights to all the sessions using a “Fuzzy Membership Function” based on the number of URLs accessed by the sessions. After assigning the weights we apply a “Fuzzy c-Mean Clustering” algorithm to discover the clusters of user profiles. In this paper, we provide a detailed review of various techniques to preprocess the web log data including data fusion, data cleaning, user identification and session identi--fication. We also describe our methodology to perform feature selection (or dimensionality reduction) and session weight assignment tasks. Finally we compare our soft computing based approach of session weight assignment with the traditional hard computing based approach of small session elimination.
机译:由于WWW的增长和复杂性的不断增加,网站发布者在吸引和保留用户方面面临越来越大的困难。为了设计有吸引力的网站,设计人员必须了解其用户的需求。因此,分析用户的导航行为是网页设计的重要组成部分。 Web用法挖掘(WUM)是将数据挖掘技术应用于Web用法数据,以便发现可用于分析用户导航行为的模式。预处理,知识提取和结果分析是WUM的三个主要步骤。由于Web日志中存在大量无关信息,因此原始日志文件不能直接在WUM流程中使用。在WUM的预处理阶段,原始Web日志数据将转换为一组用户配置文件。每个用户配置文件捕获代表用户会话的一组URL。此会话化的数据可用作各种数据挖掘任务的输入,例如聚类,关联规则挖掘,序列挖掘等。如果手头的数据挖掘任务是聚类的,则对会话文件进行过滤以按顺序删除非常小的会话消除数据中的噪音。但是,直接删除这些小型会话可能会导致大量信息丢失,尤其是在小型会话的数量很大时。我们提出了一种“模糊集理论”方法来解决这个问题。我们不是直接删除所有低于指定阈值的小型会话,而是根据会话访问的URL数量使用“模糊成员资格函数”为所有会话分配权重。分配权重后,我们应用“模糊c均值聚类”算法来发现用户配置文件的聚类。在本文中,我们详细介绍了用于预处理Web日志数据的各种技术,包括数据融合,数据清理,用户标识和会话标识- -- 功能。我们还描述了执行特征选择(或降维)和会话权重分配任务的方法。最后,我们将基于会话权重分配的基于软计算的方法与基于传统的基于硬计算的小会话消除方法进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号