【24h】

A machine learning approach to web mining

机译:网络挖掘的机器学习方法

获取原文

摘要

In thie paper a Web mining tool for content-based classification of Web pages is presented. The tool, named WebClass, can be used for resource discovery purposes. Information considered by the system is both the textual contents of Web pages and the layout structure defined HTML tags. The representation language adopted for Webgapges is the gag-of-words, where words are selected for training documents by means of a novel scoring measure. Three different classification models are empirically compared on a classification taks: Decision trees, centroids, and k-nearest-neigbor. Experimental results are reported and conclusions are drawn on the relevance of the HTML layout structure for classification purposes, on the significance of words selected by the scoring measure, as well as on the performance of the different classifiers.
机译:在The Paper中,呈现了一种用于网页的基于内容的基于内容分类的网站挖掘工具。该工具名为WebClass,可用于资源发现目的。系统考虑的信息既是网页的文本内容,也是定义了HTML标记的布局结构。为WebGapges采用的表示语言是单词的噱头,其中通过新颖的评分措施来选择为培训文件而选择单词。在分类Taks比较以下三种不同的分类模型:决策树,质心和K-Istall-Neigbor。报告了实验结果,并在分类目的的HTML布局结构的相关性上提出了关于分类目的的重要性,以及所选择的评分测量选择的意义,以及不同分类器的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号