数据预处理是Web日志挖掘的首要环节,而会话识别是数据预处理中的关键步骤之一.为了更好地实现会话识别、提高会话识别的真实度,从而为后续的模式挖掘工作提供精确的挖掘数据,文中在分析了现有常用的会话识别方法后,提出了优化初始会话集的方法.在该方法中,首先初始会话集的产生采用传统的基于访问时间的方法,然后对初始会话集进行合并和断开操作,生成优化的会话集.最后,采用实验实现了该方法.实验结果表明会话质量得到了提高.%Data preprocessing is the first important in the process of Web log raining. At the same time, session identification plays a key role in data preprocessing. To better realize session identification and prepare for sequential work, propose a new method making use of access time and session reconstruction. In this method, the initial session sets are generated based on the access time. Then, the quality of session sets are optimized using a method of session reconstruction,that is,a method of union and rupture. At last,the method studied is carried out,and experimental results illustrate that the quality of session identification is more efficient.
展开▼