首页> 外文会议>Proceedings of 23rd ACM conference on hypertext and social media >Building Enriched Web Page Representations usina Link Paths
【24h】

Building Enriched Web Page Representations usina Link Paths

机译:使用链接路径构建丰富的网页表示形式

获取原文
获取原文并翻译 | 示例

摘要

Anchor text has a history of enriching documents for a variety of tasks within the World Wide Web. Anchor texts are useful because they are similar to typical Web queries, and because they express the document's context. Therefore, it is a common practice for Web search engines to incorporate incoming anchor text into the document's standard textual representation. However, this approach will not suffice for documents with very few inlinks. and it does not incorporate the document's full context. To mediate these problems, we employ link paths, which contain anchor texts from paths through the Web ending at the document in question. We propose and study several different ways to aggregate anchor text from link paths, and we show that the information from link paths can be used to (1) improve known item search in site-specific search, and (2) map Web pages to database records. We rigorously evaluate our proposed approach on several real world test collections. We find that our approach significantly improves performance over baseline and existing techniques in both tasks.
机译:锚文本具有丰富文档的历史,该文档可用于万维网中的各种任务。锚文本非常有用,因为它们与典型的Web查询相似,并且它们表示文档的上下文。因此,Web搜索引擎通常将传入的锚文本合并到文档的标准文本表示中。但是,这种方法对于具有很少链接的文档是不够的。并且它没有包含文档的完整上下文。为了解决这些问题,我们采用了链接路径,其中包含来自通过Web的路径终止于所讨论文档的锚文本。我们提出并研究了几种不同的方法来聚合来自链接路径的锚文本,并且我们表明来自链接路径的信息可用于(1)在特定于站点的搜索中改进已知项搜索,以及(2)将网页映射到数据库记录。我们在几个实际测试集上严格评估了我们提出的方法。我们发现,在两项任务中,我们的方法都比基线和现有技术显着提高了性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号