Web Usage Data Cleaning: A Rule-Based Approach for Weblog Data Cleaning

机译：Web用法数据清理：基于规则的Weblog数据清理方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper addresses the issue of Weblog Data cleaning within the scope of Web Usage Mining. Weblog data are information on end-user clicks and underlying user-agent hits recorded by webservers. Since Web Usage Mining is interested in end-user behavior, user-agent hits are referred to as noise to be cleaned before mining. The most referenced and implemented cleaning methods are the conventional and advanced cleaning. They are content-centric filtering heuristics, based on the requested resource attribute of the weblog database. These cleaning methods are limited in terms of relevancy, workability and cost constraints, within the context of dynamic and responsive web. In order to deal with dynamic and responsive web constraints, this contribution introduces a rule-based cleaning method focused on the logging structure rules. The rule-based cleaning method experimentation demonstrates significant advantages compared to the content-centric methods.

机译：本文解决了Web用法挖掘范围内的Weblog数据清理问题。 Weblog数据是有关Web服务器记录的最终用户点击和底层用户代理点击的信息。由于Web用法挖掘对最终用户的行为很感兴趣，因此将用户代理命中称为挖掘前要清除的噪声。引用最多且实施最广泛的清洁方法是常规清洁和高级清洁。它们是基于请求的Weblog数据库资源属性的以内容为中心的过滤试探法。在动态和响应的环境中，这些清洗方法在相关性，可操作性和成本约束方面受到限制。为了应对动态和响应性的Web约束，此内容引入了一种基于规则的清除方法，该方法侧重于日志记录结构规则。与以内容为中心的方法相比，基于规则的清洁方法实验显示出显着的优势。

著录项

来源
《International conference on big data analytics and knowledge discovery》|2018年|193-203|共11页
会议地点
作者
Amine Ganibardi; Cherif Arab Ali;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Web Usage Mining; Web usage data preprocessing Weblog data cleaning;

机译：Web用法挖掘; Web使用数据预处理Weblog数据清理;

相似文献

外文文献
中文文献
专利

1. A Mapreduce-Based Parallel Data Cleaning Algorithm in Web Usage Mining [J] . Mitali Srivastava Rakhi Garg and P. K. Mishra International Journal of Computer Science & Applications . 2017,第2期

机译：Web使用挖掘中基于Mapreduce的并行数据清除算法
2. CLEANING METADATA ON THE WORLD WIDE WEB: SUGGESTIONS FOR A REGULATORY APPROACH [J] . Marcel Gordon The John Marshall journal of computer & information law . 2006,第4期

机译：在万维网上清洁元数据：建议采用一种监管方法
3. E-business presents data quality challenges on four distinct fronts. Are you prepared to face them? -cleaning up Web data [J] . Julie McNamara DB2 magazine: Strategies & Solutions for the Database Professional . 1999,第4期

机译：电子商务在四个不同的方面提出了数据质量挑战。您准备好面对他们了吗？ -清理Web数据
4. Web Usage Data Cleaning: A Rule-Based Approach for Weblog Data Cleaning [C] . Amine Ganibardi, Cherif Arab Ali International Conference on Big Data Analytics and Knowledge Discovery . 2018

机译：Web使用数据清洁：基于规则的WebLog数据清洁方法
5. Scaling the Technology Opportunity Analysis text data mining methodology: Data extraction, cleaning, online analytical processing analysis, and reporting of large multi-source datasets. [D] . George, Richard Peyton. 2006

机译：扩展技术机会分析文本数据挖掘方法：数据提取，清理，在线分析处理分析以及大型多源数据集的报告。
6. Creating longitudinal datasets and cleaning existing data identifiers in a cystic fibrosis registry using a novel Bayesian probabilistic approach from astronomy [O] . Peter Donald Hurley, Seb Oliver, Anil Mehta -1

机译：使用来自天文学的新颖贝叶斯概率方法在囊性纤维化注册表中创建纵向数据集并清除现有数据标识符
7. Data Cleaning Framework: An Extensible Approach to Data Cleaning [O] . Gu Randy S. 2010

机译：数据清理框架：一种可扩展的数据清理方法

Web Usage Data Cleaning: A Rule-Based Approach for Weblog Data Cleaning

摘要

著录项

相似文献

相关主题

期刊订阅