基于网络日志挖掘的网页预测系统可以有效地分析用户未来的网络访问请求,从而达到智能推荐、改善网络性能等目的.针对目前的预测模型缺乏有效的语义处理的问题,将词语语义信息和统计语言模型相结合,提出一种基于文档相关度计算的网页统计预测模型.通过词频信息和知网(HowNet)中词的概念计算模型计算网页文档间的主题相关度,再将该语义信息与统计模型计算的条件概率值相结合,以此作为预测的依据.实验表明,该技术使预测模型的性能获得了较大的提高.%The Web page prediction system based on web log mining can be used to effectively analyse users' request of internet access in the future, therefore reaches the goals of intelligent recommendation and website performance improvement, etc. Current prediction model lacks effective semantics processing, to solve it, by combining the semantics information of words with statistical language model, we propose a webpage statistic prediction model which is based on document relevancy computation in this paper. It calculates relevancy of topics between the webpage documents using both the word frequency information and the concept computation model of words in How Net first, and then combines this semantics information with conditional probability value computed by statistical model, the result is used to guide the process of page prediction. Experiment shows that this technique greatly improves the performance of prediction model.
展开▼