首页> 中文期刊> 《计算机科学技术学报:英文版》 >Innovating Web Page Classification Through Reducing Noise

Innovating Web Page Classification Through Reducing Noise

         

摘要

This paper presents a new method that eliminates noise in Web page classification. It first describes the presentation of a Web page based on HTML tags. Then through a novel distance formula, it eliminates the noise in similarity measure. After carefully analyzing Web pages, we design an algorithm that can distinguish related hyperlinks from noisy ones.We can utilize non-noisy hyperlinks to improve the performance of Web page classification (the CAWN algorithm). For any page, wecan classify it through the text and category of neighbor pages related to the page. The experimental results show that our approach improved classification accuracy.

著录项

  • 来源
    《计算机科学技术学报:英文版》 |2002年第1期|9-17|共9页
  • 作者

    李晓黎; 史忠植;

  • 作者单位

    KeyLaboratoryofIntelligentInformationProcessing;

    InstituteofComputingTechnologyTheChineseAcademyofSciences;

    Beijing100080;

    P.R.China;

    KeyLaboratoryofIntelligentInformationProcessing;

    InstituteofComputingTe;

  • 原文格式 PDF
  • 正文语种 chi
  • 中图分类 TP393.092;
  • 关键词

    降噪; Web页分类; 计算机网络;

    机译:网页分类;相似度度量;无噪声分类算法;
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号