首页> 外文会议>International Conference on Big Data Analytics and Knowledge Discovery >Web Usage Data Cleaning: A Rule-Based Approach for Weblog Data Cleaning
【24h】

Web Usage Data Cleaning: A Rule-Based Approach for Weblog Data Cleaning

机译:Web使用数据清洁:基于规则的WebLog数据清洁方法

获取原文

摘要

This paper addresses the issue of Weblog Data cleaning within the scope of Web Usage Mining. Weblog data are information on end-user clicks and underlying user-agent hits recorded by webservers. Since Web Usage Mining is interested in end-user behavior, user-agent hits are referred to as noise to be cleaned before mining. The most referenced and implemented cleaning methods are the conventional and advanced cleaning. They are content-centric filtering heuristics, based on the requested resource attribute of the weblog database. These cleaning methods are limited in terms of relevancy, workability and cost constraints, within the context of dynamic and responsive web. In order to deal with dynamic and responsive web constraints, this contribution introduces a rule-based cleaning method focused on the logging structure rules. The rule-based cleaning method experimentation demonstrates significant advantages compared to the content-centric methods.
机译:本文在Web使用挖掘范围内讨论了Weblog数据清理的问题。 Weblog数据是关于最终用户的单击和WebServers记录的底层用户代理命令。由于Web使用挖掘对最终用户行为感兴趣,因此用户代理命中在挖掘之前被称为要清除的噪声。最引用和实施的清洁方法是传统和高级清洁。它们是满足以满足的过滤启发式,基于WebLog数据库的所请求的资源属性。在动态和响应网的背景下,这些清洁方法在相关性,可加工性和成本限制方面受到限制。为了处理动态和响应的Web约束,此贡献引入了一种基于规则的清洁方法,其集中在日志记录结构规则上。与以内容为中心的方法相比,基于规则的清洁方法实验表明了显着的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号