首页> 外文会议>Advances in Multimedia Information Processing - PCM 2008 >Automatic Web Page Classification Using Various Features
【24h】

Automatic Web Page Classification Using Various Features

机译:使用各种功能的自动网页分类

获取原文
获取原文并翻译 | 示例

摘要

A model of automatically classifying uncertain Web pages using multiple features is presented. Since the traditional tree structure can barely classify an avalanche of new Web pages, the proposed approach partially uses the idea of "bag of words" incorporating the idea of classification fusion to describe and categorize Web pages. The proposed approach extracts features of Web pages from various perspectives, such as consulting a Web directory service, analyzing the text features of Web pages' titles and meta-search keywords, and identifying primary content of Web pages. Through fusing the results from these three dedicated classifiers, Web pages are classified to one or more categories with a bunch of words representing the Web pages. In order to demonstrate the effectiveness of the proposed method, experiments are carried out. In the experiments, the Web pages are classified using the proposed fusion method to four categories. A comparison between the dedicated classifiers and fusion methods is also presented.
机译:提出了使用多种功能自动对不确定网页进行分类的模型。由于传统的树形结构几乎无法对大量新网页进行分类,因此所提出的方法部分使用了“词袋”的思想,并结合了分类融合的思想来对网页进行描述和分类。所提出的方法从各种角度提取网页的特征,例如咨询Web目录服务,分析网页标题和元搜索关键字的文本特征以及识别网页的主要内容。通过将这三个专用分类器的结果融合在一起,可以将网页分类为一个或多个类别,并用一堆单词表示该网页。为了证明该方法的有效性,进行了实验。在实验中,使用所提出的融合方法将网页分为四个类别。还介绍了专用分类器和融合方法之间的比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号