WMS- Extracting Multiple Sections Data Records from Search Engine Results Pages

机译：WMS-从搜索引擎结果页提取多节数据记录

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we develop an automatic wrapper for the extraction of multiple sections data records from search engine results pages. In the Information Extraction world, less attention has been focused on the development of wrappers for the extraction of multiple sections data records. This is evidenced by the fact that there is only one automatic wrapper, MSE developed for this purpose. Using the separation distance of data records and sections, MSE is able to distinguish sections and data records and extract them from search engine results pages. In this study, our approach is the use of DOM tree properties to develop an adaptive search method which is able to detect, differentiate, and partition sections and data records. The multiple sections data records labeled are used to pass through a few filtering stages, each filter is designed to filter out a particular group of irrelevant data until one data region containing the relevant records is found. Our filtering rules are designed based on visual cue such as text and image size obtained from the browser rendering engine. Experimental results show that our wrapper is able to obtain better results than the currently available MSE wrapper.

机译：在本文中，我们开发了一种自动包装程序，用于从搜索引擎结果页面中提取多个部分的数据记录。在信息提取世界中，较少的注意力集中在用于提取多节数据记录的包装器的开发上。事实证明，只有一个自动包装程序，MSE为此目的而开发。利用数据记录和部分的分隔距离，MSE能够区分部分和数据记录，并将其从搜索引擎结果页面中提取出来。在这项研究中，我们的方法是使用DOM树属性来开发一种自适应搜索方法，该方法能够检测，区分和分区节和数据记录。标记为多个部分的数据记录用于通过几个过滤阶段，每个过滤器设计为过滤掉一组特定的不相关数据，直到找到一个包含相关记录的数据区域。我们的过滤规则是根据视觉提示设计的，例如从浏览器渲染引擎获得的文本和图像大小。实验结果表明，与目前可用的MSE包装器相比，我们的包装器能够获得更好的结果。

著录项

来源
《Annual ACM symposium on applied computing;ACM symposium on applied computing;SAC 2010》|2010年|P.1696-1701|共6页
会议地点
作者
Jer Lang Hong; Eu-Gene Siew; Simon Egerton;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
automatic wrapper; search engine results pages; multiple sections data records;

机译：自动包装;搜索引擎结果页面;多节数据记录;

相似文献

外文文献
中文文献
专利

1. Using data island method for creating metadata records with indexability and visibility of tag names in web search engines [J] . Sayyed Mahdi Taheri, Nadjla Hariri, Sayyed Rahmatollah Fattahi Library hi tech . 2014,第1期

机译：使用数据岛方法在网络搜索引擎中创建具有可索引性和标签名称可见性的元数据记录
2. MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines [J] . Kwon T., Choi H., Vogel C., Journal of proteome research . 2011,第7期

机译：MSblender：一种集成来自多个数据库搜索引擎的肽段鉴定的概率方法
3. Natural language processing to extract symptoms of severe mental illness from clinical text: the Clinical Record Interactive Search Comprehensive Data Extraction (CRIS-CODE) project [J] . Rashmi Patel, Nishamali Jayatilleke, Anna Kolliakou, BMJ Open . 2017,第1期

机译：通过自然语言处理从临床文本中提取严重精神疾病的症状：“临床记录交互式搜索综合数据提取（CRIS-CODE）”项目
4. WMS- Extracting Multiple Sections Data Records from Search Engine Results Pages [C] . Annual ACM symposium on applied computing . 2010

机译：从搜索引擎结果页面中提取多个部分数据记录
5. Automatic wrapper generation for the extraction of search result records from search engines. [D] . Zhao, Hongkun. 2007

机译：自动包装器生成，用于从搜索引擎中提取搜索结果记录。
6. MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines [O] . Taejoon Kwon, Hyungwon Choi, Christine Vogel, -1

机译：MSBlender：一种从多个数据库搜索引擎集成肽识别的概率方法
7. MSblender: A Probabilistic Approach for Integrating Peptide Identifications from Multiple Database Search Engines [O] . Kwon, Taejoon, Choi, Hyungwon, Vogel, Christine, 2015

机译：MSblender：一种集成来自多个数据库搜索引擎的肽段鉴定的概率方法

WMS- Extracting Multiple Sections Data Records from Search Engine Results Pages

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅