Web Usage Data Cleaning: A Rule-Based Approach for Weblog Data Cleaning

机译：Web使用数据清洁：基于规则的WebLog数据清洁方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper addresses the issue of Weblog Data cleaning within the scope of Web Usage Mining. Weblog data are information on end-user clicks and underlying user-agent hits recorded by webservers. Since Web Usage Mining is interested in end-user behavior, user-agent hits are referred to as noise to be cleaned before mining. The most referenced and implemented cleaning methods are the conventional and advanced cleaning. They are content-centric filtering heuristics, based on the requested resource attribute of the weblog database. These cleaning methods are limited in terms of relevancy, workability and cost constraints, within the context of dynamic and responsive web. In order to deal with dynamic and responsive web constraints, this contribution introduces a rule-based cleaning method focused on the logging structure rules. The rule-based cleaning method experimentation demonstrates significant advantages compared to the content-centric methods.

机译：本文在Web使用挖掘范围内讨论了Weblog数据清理的问题。 Weblog数据是关于最终用户的单击和WebServers记录的底层用户代理命令。由于Web使用挖掘对最终用户行为感兴趣，因此用户代理命中在挖掘之前被称为要清除的噪声。最引用和实施的清洁方法是传统和高级清洁。它们是满足以满足的过滤启发式，基于WebLog数据库的所请求的资源属性。在动态和响应网的背景下，这些清洁方法在相关性，可加工性和成本限制方面受到限制。为了处理动态和响应的Web约束，此贡献引入了一种基于规则的清洁方法，其集中在日志记录结构规则上。与以内容为中心的方法相比，基于规则的清洁方法实验表明了显着的优势。

著录项

来源
《International Conference on Big Data Analytics and Knowledge Discovery》|2018年|398p|共11页
会议地点
作者
Amine Ganibardi; Cherif Arab Ali;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13-53;
关键词
Web Usage Mining; Web usage data preprocessing Weblog data cleaning;

机译：Web使用挖掘;Web使用数据预处理WebLog数据清洁;

相似文献

外文文献
中文文献
专利

1. A Mapreduce-Based Parallel Data Cleaning Algorithm in Web Usage Mining [J] . Mitali Srivastava Rakhi Garg and P. K. Mishra International Journal of Computer Science & Applications . 2017,第2期

机译：Web使用挖掘中基于Mapreduce的并行数据清除算法
2. CLEANING METADATA ON THE WORLD WIDE WEB: SUGGESTIONS FOR A REGULATORY APPROACH [J] . Marcel Gordon The John Marshall journal of computer & information law . 2006,第4期

机译：在万维网上清洁元数据：建议采用一种监管方法
3. E-business presents data quality challenges on four distinct fronts. Are you prepared to face them? -cleaning up Web data [J] . Julie McNamara DB2 magazine: Strategies & Solutions for the Database Professional . 1999,第4期

机译：电子商务在四个不同的方面提出了数据质量挑战。您准备好面对他们了吗？ -清理Web数据
4. Web Usage Data Cleaning: A Rule-Based Approach for Weblog Data Cleaning [C] . Amine Ganibardi, Cherif Arab Ali International conference on big data analytics and knowledge discovery . 2018

机译：Web用法数据清理：基于规则的Weblog数据清理方法
5. Scaling the Technology Opportunity Analysis text data mining methodology: Data extraction, cleaning, online analytical processing analysis, and reporting of large multi-source datasets. [D] . George, Richard Peyton. 2006

机译：扩展技术机会分析文本数据挖掘方法：数据提取，清理，在线分析处理分析以及大型多源数据集的报告。
6. Creating longitudinal datasets and cleaning existing data identifiers in a cystic fibrosis registry using a novel Bayesian probabilistic approach from astronomy [O] . Peter Donald Hurley, Seb Oliver, Anil Mehta -1

机译：使用来自天文学的新颖贝叶斯概率方法在囊性纤维化注册表中创建纵向数据集并清除现有数据标识符
7. Data Cleaning Framework: An Extensible Approach to Data Cleaning [O] . Gu Randy S. 2010

机译：数据清理框架：一种可扩展的数据清理方法

Web Usage Data Cleaning: A Rule-Based Approach for Weblog Data Cleaning

摘要

著录项

相似文献

相关主题

期刊订阅