首页> 外文期刊>Wuhan University Journal of Natural Sciences >A Method of Eliminating Noises in Web Pages by Style Tree Model and Its Applications
【24h】

A Method of Eliminating Noises in Web Pages by Style Tree Model and Its Applications

机译:样式树模型消除网页噪声的方法及其应用

获取原文
获取原文并翻译 | 示例
       

摘要

A Web page typically contains many information blocks. Apart from the main content blocks, it usually has such blocks as navigation panels, copyright and privacy notices, and advertisements. We call these blocks the noisy blocks. The noises in Web pages can seriously harm Web data mining. To the question of eliminating these noises, we introduce a new tree structure, called Style Tree , and study an algorithm how to construct a site style tree. The Style Tree Model is employed to detect and eliminatenoises in any Web pages of the site. An information based measure to determine which element node is noisy is also constructed. In addition, the applications of this method are discussed in detail. Experimental results show that our noises elimination technique is able to improve the mining results significantly.
机译:网页通常包含许多信息块。除主要内容块外,它通常还具有导航面板,版权和隐私声明以及广告之类的块。我们称这些块为噪声块。网页中的噪音会严重损害Web数据挖掘。为了消除这些噪声,我们引入了一种称为样式树的新树结构,并研究了一种如何构建站点样式树的算法。样式树模型用于检测和消除站点的任何网页中的噪声。还构建了一种基于信息的度量,以确定哪个元素节点有噪声。另外,详细讨论了该方法的应用。实验结果表明,我们的降噪技术能够显着改善采矿效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号