首页> 外文会议>2012 National Conference on Computing and Communication Systems. >Novel pre-processing technique for web log mining by removing global noise and web robots
【24h】

Novel pre-processing technique for web log mining by removing global noise and web robots

机译:通过消除全局噪音和网络机器人进行网络日志挖掘的新型预处理技术

获取原文
获取原文并翻译 | 示例

摘要

Today internet has made the life of human dependent on it. Almost everything and anything can be searched on net. Web pages usually contain huge amount of information that may not interest the user, as it may not be the part of the main content of the web page. Web Usage Mining (WUM) is one of the main applications of data mining, artificial intelligence and so on to the web data and forecast the user's visiting behaviors and obtains their interests by investigating the samples. Since WUM directly involves in applications, such as, e-commerce, e-learning, Web analytics, information retrieval etc. Weblog data is one of the major sources which contain all the information regarding the users visited links, browsing patterns, time spent on a particular page or link and this information can be used in several applications like adaptive web sites, modified services, customer summary, pre-fetching, generate attractive web sites etc. There are varieties of problems related with the existing web usage mining approaches. Existing web usage mining algorithms suffer from difficulty of practical applicability. This paper continues the line of research on Web access log analysis is to analyze the patterns of web site usage and the features of users behavior. It is the fact that the normal Log data is very noisy and unclear and it is vital to preprocess the log data for efficient web usage mining process. Preprocessing is the process comprises of three phases which includes data cleaning, user identification, and pattern discovery and pattern analysis. Log data is characteristically noisy and unclear, so preprocessing is an essential process for effective mining process. In this paper, a novel pre-processing technique is proposed by removing local and global noise and web robots. Preprocessing is an important step since the Web architecture is very complex in nature and 80% of the mining process is done at this phase. Anonymous Microsoft Web Dataset and MSNBC.com Anonymous Web D- taset are used for evaluating the proposed preprocessing technique.
机译:如今,互联网已使人们的生活赖以生存。几乎所有东西都可以在网上搜索。网页通常包含大量可能使用户不感兴趣的信息,因为它可能不是网页主要内容的一部分。 Web用法挖掘(WUM)是数据挖掘,人工智能等对Web数据的主要应用之一,它可以通过调查样本来预测用户的访问行为并获得他们的兴趣。由于WUM直接涉及诸如电子商务,电子学习,Web分析,信息检索等应用程序,因此Weblog数据是主要来源之一,其中包含有关用户访问链接,浏览模式,花费时间的所有信息。特定页面或链接,并且该信息可用于多种应用程序中,例如自适应网站,修改的服务,客户摘要,预取,生成有吸引力的网站等。与现有的网络使用挖掘方法相关的问题很多。现有的网络使用挖掘算法存在实际适用性的困难。本文继续对Web访问日志分析进行研究,以分析网站使用模式和用户行为特征。事实是,正常的Log数据非常嘈杂且不清楚,因此对日志数据进行预处理对于有效的Web使用挖掘过程至关重要。预处理过程包括三个阶段,其中包括数据清理,用户标识以及模式发现和模式分析。日志数据通常是嘈杂且不清楚的,因此预处理是有效采矿过程的必要过程。本文提出了一种新颖的预处理技术,旨在消除局部和全局噪声以及网络机器人。预处理是重要的一步,因为Web体系结构本质上非常复杂,并且在此阶段完成了80%的挖掘过程。匿名Microsoft Web数据集和MSNBC.com匿名Web数据集用于评估建议的预处理技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号