Ranking XPaths for extracting search result records

机译：对Xpath进行排名以提取搜索结果记录

页面导航

摘要
著录项
相似文献
相关主题

摘要

Extracting search result records (SRRs) from webpages is useful for building an aggregated search engine which combines search results from a variety of search engines. Most automatic approaches to search result extraction are not portable: the complete process has to be rerun on a new search result page. In this paper we describe an algorithm to automatically determine XPath expressions to extract SRRs from webpages. Based on a single search result page, an XPath expression is determined which can be reused to extract SRRs from pages based on the same template. The algorithm is evaluated on a six datasets, including two new datasets containing a variety of web, image, video, shopping and news search results. The evaluation shows that for 85% of the tested search result pages, a useful XPath is determined. The algorithm is implemented as a browser plugin and as a standalone application which are available as open source software.

机译：从网页提取搜索结果记录（SRR）对于构建将来自各种搜索引擎的搜索结果进行组合的聚合搜索引擎很有用。大多数自动提取搜索结果的方法都不是可移植的：整个过程必须在新的搜索结果页面上重新运行。在本文中，我们描述了一种自动确定XPath表达式以从网页提取SRR的算法。基于单个搜索结果页面，可以确定一个XPath表达式，该表达式可以重复使用以基于同一模板从页面中提取SRR。该算法在六个数据集上进行了评估，其中包括两个新的数据集，其中包含各种Web，图像，视频，购物和新闻搜索结果。评估显示，对于85％的测试搜索结果页，确定了有用的XPath。该算法以浏览器插件和独立应用程序的形式实现，可以作为开源软件使用。

著录项

作者
Trieschnigg Dolf; Tjin-Kam-Jet Kien; Hiemstra Djoerd;
展开▼
作者单位

展开▼
年度 2012
总页数
原文格式 PDF
正文语种 {"code":"en","name":"English","id":9}
中图分类

相似文献

外文文献
中文文献
专利

1. Using the Crowd to Improve Search Result Ranking and the Search Experience [J] . Kim Yubin, Collins-Thompson Kevyn, Teevan Jaime ACM transactions on intelligent systems . 2016,第4期

机译：使用人群来提高搜索结果排名和搜索体验
2. Ranking versus reputation: perception and effects of search result credibility [J] . Haas Alexander, Unkel Julian Behaviour & Information Technology . 2017,第10a12期

机译：排名与声誉：搜索结果可信度的感知和影响
3. Topic-Driven SocialRank: Personalized search result ranking by identifying similar, credible users in a social network [J] . Young An Kim, Gun Woo Park Knowledge-Based Systems . 2013,第deca期

机译：主题驱动的SocialRank：通过在社交网络中识别相似，可信的用户来个性化搜索结果排名
4. Extracting Cross References from Life Science Databases for Search Result Ranking [C] . Anja Bachmann, Rene Schult, Matthias Lange, ACM international conference on information and knowledge management . 2011

机译：从生命科学数据库中提取交叉引用以进行搜索结果排名
5. Automatic wrapper generation for the extraction of search result records from search engines. [D] . Zhao, Hongkun. 2007

机译：自动包装器生成，用于从搜索引擎中提取搜索结果记录。
6. Natural language processing to extract symptoms of severe mental illness from clinical text: the Clinical Record Interactive Search Comprehensive Data Extraction (CRIS-CODE) project [O] . Richard G Jackson, Rashmi Patel, Nishamali Jayatilleke, 2017

机译：通过自然语言处理从临床文本中提取严重精神疾病的症状：临床记录交互式搜索综合数据提取（CRIS-CODE）项目
7. Search Result Merging and Ranking Strategies in Meta-Search Engines: A Survey [O] . Hossein Jadidoleslamy 2012

机译：元搜索引擎中的搜索结果合并和排名策略：一项调查

Ranking XPaths for extracting search result records

摘要

著录项

相似文献

相关主题

期刊订阅