首页> 外文会议>International Conference of the CLEF Initiative >Supporting More-Like-This Information Needs: Finding Similar Web Content in Different Scenarios
【24h】

Supporting More-Like-This Information Needs: Finding Similar Web Content in Different Scenarios

机译:支持更多类似的信息需求:在不同的方案中查找类似的Web内容

获取原文

摘要

We examine more-like-this information needs in different scenarios. A more-like-this information need occurs, when the user sees one interesting document and wants to access other but similar documents. One of our foci is on comparing different strategies to identify related web content. We compare following links (i.e., crawling), automatically generating keyqueries for the seen document (i.e., queries that have the document in the top of their ranks), and search engine operators that automatically display related results. Our experimental study shows that in different scenarios different strategies yield the most promising related results. One of our use cases is to automatically support people who monitor rightwing content on the web. In this scenario, it turns out that crawling from a given set of seed documents is the best strategy to find related pages with similar content. Querying or the related-operator yield much fewer good results. In case of news portals, however, crawling is a bad idea since hardly any news portal links to other news portals. Instead, a search engine's related operator or querying are better strategies. Finally, for identifying related scientific publications for a given paper, all three strategies yield good results.
机译:我们在不同的场景中检查更多类似的信息。当用户看到一个有趣的文档并且想要访问其他但类似的文档时,需要发生更多类似的信息。我们的一个焦点是比较不同的策略来识别相关的Web内容。我们比较以下链接(即,爬网),自动生成所看到的文档(即,在其排名顶部的文件的查询),以及自动显示相关结果的搜索引擎运算符。我们的实验研究表明,在不同的情景中,不同的策略产生最有前途的相关结果。我们的一个用例是自动支持监视Web上的右翼内容的人。在这种情况下,事实证明,来自给定的一组种子文档的爬行是找到具有类似内容的相关页面的最佳策略。查询或相关操作员的效果率较少。然而,如果是新闻门户,爬行是一个坏主意,因为几乎没有任何新闻门户网站链接到其他新闻门户网站。相反,搜索引擎的相关运营商或查询是更好的策略。最后,为了识别给定纸张的相关科学出版物,所有三种策略都会产生良好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号