首页> 外文期刊>Data & Knowledge Engineering >A cost-effective method for detecting web site replicas on search engine databases
【24h】

A cost-effective method for detecting web site replicas on search engine databases

机译:一种经济高效的方法,用于在搜索引擎数据库上检测网站副本

获取原文
获取原文并翻译 | 示例

摘要

Identifying replicated sites is an important task for search engines. It can reduce data storage costs, improve query processing time and remove noise that might affect the quality of the final answers given to the user. This paper introduces a new approach to detect web sites that are likely to be replicas in a search engine database. Our method uses the websites' structure and the content of their pages to identify possible replicas. As we show through experiments, such a combination improves the precision and reduces the overall costs related to the replica detection task. Our method achieves a quality improvement of 47.23% when compared to previously proposed approaches.
机译:识别复制站点是搜索引擎的重要任务。它可以减少数据存储成本,缩短查询处理时间并消除可能影响提供给用户的最终答案的质量的噪音。本文介绍了一种新的方法来检测可能是搜索引擎数据库中副本的网站。我们的方法使用网站的结构和页面内容来识别可能的副本。正如我们通过实验显示的那样,这种组合提高了精度,并降低了与副本检测任务相关的总成本。与先前提出的方法相比,我们的方法可将质量提高47.23%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号