Visual Segmentation-Based Data Record Extraction from Web Documents

机译：从Web文档中基于可视分段的数据记录提取

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Semi-structured data records contained in the Web pages provide useful information for shopping agents and metasearch engines. In this paper, we present a visual segmentation-based data record extraction (VSDR) method to extract data records from those Web pages. VSDR method first segments a Web page into semantic blocks using the spatial closeness and visual resemblance of data records, then neighboring and non-neighboring data records are extracted based on a compress and collapse technique. Experimental results show that unlike the existing methods which only generate good results on their test domains, VSDR is a general data record extraction method that is able to produce quite stable and good results on a wide range of Web pages.

机译：网页中包含的半结构化数据记录为购物代理商和元搜索引擎提供了有用的信息。在本文中，我们提出了一种基于视觉分段的数据记录提取（VSDR）方法，以从那些Web页面中提取数据记录。 VSDR方法首先使用数据记录的空间紧密性和视觉相似性将网页划分为语义块，然后基于压缩和折叠技术提取相邻和不相邻的数据记录。实验结果表明，与现有方法仅在其测试域上产生良好结果的方法不同，VSDR是一种通用的数据记录提取方法，它能够在各种Web页面上产生相当稳定且良好的结果。

著录项

来源
《Information Reuse and Integration, 2007 IEEE International Conference on》|1979年|P.502-507|共6页
会议地点 Kent(GB)
作者
Li Longzhuang; Liu Yonghuai; Obregon Abel; Weatherston Matt;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
入库时间 2022-08-26 14:03:20

相似文献

外文文献
中文文献
专利

1. Extraction of Unstructured Data Records and Discovering New Attributes from the Web Documents [J] . Padmapriya.G, Dr.M.Hemalatha International Journal of Computer Trends and Technology . 2014,第3期

机译：提取非结构化数据记录并从Web文档中发现新属性
2. Validation of the TOtal Visual acuity extraction Algorithm (TOVA) for automated extraction of visual acuity and intraocular pressure data from free text clinical records [J] . Baughman Doug, Lee Cecilia, Lee Aaron Y. Investigative ophthalmology & visual science . 2017,第8期

机译：从自由文本临床记录中验证可视敏锐度和眼内压力数据的自动提取敏锐提取算法（TOVA）
3. Validation of the TOtal Visual acuity extraction Algorithm (TOVA) for automated extraction of visual acuity and intraocular pressure data from free text clinical records [J] . Baughman Doug, Lee Cecilia, Lee Aaron Y. Investigative ophthalmology & visual science . 2017,第8期

机译：从自由文本临床记录中验证可视敏锐度和眼内压力数据的自动提取敏锐提取算法（TOVA）
4. Visual Segmentation-Based Data Record Extraction from Web Documents [C] . Li, Longzhuang, Liu, . 2007

机译：从Web文档中基于可视分段的数据记录提取
5. Segmentation-based filtering and object-based feature extraction from airborne LiDAR point cloud data [D] . Chang, Jie 2011

机译：从机载LiDAR点云数据中进行基于分段的过滤和基于对象的特征提取
6. Validation of the Total Visual Acuity Extraction Algorithm (TOVA) for Automated Extraction of Visual Acuity Data From Free Text Unstructured Clinical Records [O] . Douglas M. Baughman, Grace L. Su, Irena Tsui, -1

机译：从自由文本非结构化临床记录中自动提取视敏度数据的总视敏度提取算法（TOVA）的验证
7. Information source discovery and record extraction from web and document databases [O] . 張建偉, チョウケンイ 2008

机译：从Web和文档数据库中发现信息源并进行记录提取

Visual Segmentation-Based Data Record Extraction from Web Documents

摘要

著录项

相似文献

相关主题

期刊订阅