首页> 外文会议>Web information systems engineering - WISE 2011 >Enhance Web Pages Genre Identification Using Neighboring Pages
【24h】

Enhance Web Pages Genre Identification Using Neighboring Pages

机译:使用相邻页面增强网页类型识别

获取原文
获取原文并翻译 | 示例

摘要

Recently web pages genre identification attracts more attentions because of its importance in web searching. Most of existing works used the features extracted from web pages and applied machine learning approaches like SVM as classifier to identify the genre of web pages. However, in the case where web pages do not contain enough information, such an approach may not work well. In this paper, we consider to tackle genre identification in such situations. We propose a link-based graph model that taking into account neighboring pages but greatly reducing the noisy information by selecting an appropriate subset of neighboring pages. We evaluated this neighboring pages based classifier with other classifiers. The experiments conducted on two known corpora, and the favorable results indicated that our proposed approach is feasible.
机译:最近,由于网页类型识别在网页搜索中的重要性,因此吸引了更多关注。现有的大多数作品都使用从网页中提取的功能和应用的机器学习方法(例如SVM)作为分类器来识别网页的类型。但是,在网页包含的信息不足的情况下,这种方法可能效果不佳。在本文中,我们考虑在这种情况下解决体裁识别问题。我们提出了一种基于链接的图模型,该模型考虑了相邻页面,但是通过选择合适的相邻页面子集大大减少了嘈杂的信息。我们使用其他分类器评估了该基于相邻页面的分类器。对两个已知的语料库进行了实验,结果令人满意,表明我们提出的方法是可行的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号