...
首页> 外文期刊>international journal of computer science and applications >A cross training corrective approach for web page classification
【24h】

A cross training corrective approach for web page classification

机译:

获取原文
           

摘要

© Technomathematics Research Foundation.Textual document classification is one challenging area of data mining. Web page classification is a type of textual document classification. However, the text contained in web pages is not homogenous since a web page can discuss related but different subjects. Thus, results obtained by a textual classifier on web pages are not as better as those obtained on textual documents. Therefore, we need to use a method to enhance results of those classifiers or more precisely a technique to correct their results. One category of techniques that address this problem is to use the test set hidden underlying information to correct results assigned by a textual classifier. In this paper, we propose a method that belongs to this category. Our method is a Cross Training based Corrective approach (CTC) for web page classification that learns information from the test set in order to fix classes initially assigned by a text classifier on that test set. This adjustment leads to a significant improvement on classification results. We tested our approach using three traditional classification algorithms: Support Vector Machine (SVM), Naïve Bayes (NB) and K Nearest Neighbors (KNN), on four subsets of the Open Directory Project (ODP). Results show that our collective and corrective approach, when applied after SVM, NB or KNN, enhances their classification results by up to 12.39.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号