首页> 外文会议>International conference on big data analytics and knowledge discovery >Web Usage Data Cleaning: A Rule-Based Approach for Weblog Data Cleaning
【24h】

Web Usage Data Cleaning: A Rule-Based Approach for Weblog Data Cleaning

机译:Web用法数据清理:基于规则的Weblog数据清理方法

获取原文

摘要

This paper addresses the issue of Weblog Data cleaning within the scope of Web Usage Mining. Weblog data are information on end-user clicks and underlying user-agent hits recorded by webservers. Since Web Usage Mining is interested in end-user behavior, user-agent hits are referred to as noise to be cleaned before mining. The most referenced and implemented cleaning methods are the conventional and advanced cleaning. They are content-centric filtering heuristics, based on the requested resource attribute of the weblog database. These cleaning methods are limited in terms of relevancy, workability and cost constraints, within the context of dynamic and responsive web. In order to deal with dynamic and responsive web constraints, this contribution introduces a rule-based cleaning method focused on the logging structure rules. The rule-based cleaning method experimentation demonstrates significant advantages compared to the content-centric methods.
机译:本文解决了Web用法挖掘范围内的Weblog数据清理问题。 Weblog数据是有关Web服务器记录的最终用户点击和底层用户代理点击的信息。由于Web用法挖掘对最终用户的行为很感兴趣,因此将用户代理命中称为挖掘前要清除的噪声。引用最多且实施最广泛的清洁方法是常规清洁和高级清洁。它们是基于请求的Weblog数据库资源属性的以内容为中心的过滤试探法。在动态和响应的环境中,这些清洗方法在相关性,可操作性和成本约束方面受到限制。为了应对动态和响应性的Web约束,此内容引入了一种基于规则的清除方法,该方法侧重于日志记录结构规则。与以内容为中心的方法相比,基于规则的清洁方法实验显示出显着的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号