首页> 外文会议>International Conference on Advances in Computer Engineering and Applications >Preprocessing web logs: A critical phase in web usage mining
【24h】

Preprocessing web logs: A critical phase in web usage mining

机译:预处理Web日志:Web使用挖掘中的关键阶段

获取原文

摘要

Web usage mining refers to finding out user access patterns from the web logs of a Website. The Web logs obtained are highly unstructured and this very nature of Web logs makes them unsuitable for mining directly. Hence they go through a stage called preprocessing which not only makes them suitable for analysis but reduces the file size significantly. This paper explores this preprocessing phase in detail and proposes a total and absolute tool for the same which reduces the irrelevant and noisy data and transforms it into a form so that it can be readily used for analysis. The tool has been referred to as total and absolute as after cleaning the data it shows us a summary statistics of the records at the end once they have been preprocessed. The summary statistics highlights the number of records fed as input, elements obtained after carrying out preprocessing and the time utilized in accomplishing the task. Finally it exports the preprocessed data obtained into a .log file which can be very easily imported in any data mining utility. The features of summary statistics and export data can be considered as a distinguishing feature from the other tools which have been proposed earlier.
机译:Web使用挖掘是指从网站的Web日志中查找用户访问模式。获得的Web日志非常非结构化,Web日志的本质是直接挖掘它们的本质。因此,他们经历了一个被称为预处理的阶段,这不仅使它们适合于分析,而且显着降低了文件大小。本文详细探讨了该预处理相位,并为其提出了一个总和绝对的工具,其减少了无关紧要和嘈杂的数据并将其转换为形式,使其可以容易地用于分析。该工具已被称为总数和绝对,如清洁数据后,它向我们展示了一旦​​预处理后结束的记录摘要。摘要统计信息突出显示为输入的记录数,在执行预处理之后获得的元素以及在完成任务时使用的时间。最后,它将获取的预处理数据导出到.log文件中,可以在任何数据挖掘实用程序中非常容易地导入。摘要统计和导出数据的特征可以被视为从前提出的其他工具的区分特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号