OWDEAH: Online Web Data Extraction Based on Access History

机译：OWDEAH：基于访问历史的在线网络数据提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Web data extraction systems are the kernel of information mediators between users and heterogeneous Web data resources. How to extract structured data from semi-structured documents has been a problem of active research. Supervised and unsupervised methods have been devised to learn extraction rules from training sets. However, trying to prepare training sets (especially to annotate them for supervised methods), is very time-consuming. We propose a framework for Web data extraction, which logged users' access history and exploit them to assist automatic training set generation. We cluster accessed Web documents according to their structural details; define criteria to measure the importance of sub-structures; and then generate extraction rules. We also propose a method to adjust the rules according to historical data. Our experiments confirm the viability of our proposal.

机译：Web数据提取系统是用户与异构Web数据资源之间的信息调解器内核。如何从半结构化文件中提取结构化数据一直是积极研究的问题。已经设计了监督和无监督的方法，以从培训集中学习提取规则。但是，试图准备培训集（特别是为监督方法注释它们），非常耗时。我们为Web数据提取提出了一个框架，它记录了用户访问历史记录并利用它们来帮助自动培训集生成。我们群集根据其结构细节访问Web文件;定义标准以测量子结构的重要性;然后生成提取规则。我们还提出了一种根据历史数据调整规则的方法。我们的实验证实了我们提案的可行性。

著录项

来源
《International Conference on Data Warehousing and Knowledge Discovery》|2004年||共10页
会议地点
作者
Zhao Li; Wee-Keong Ng; Kok-Leong Ong; Lecture Notes in Computer Science 3181;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13-532;
关键词

相似文献

外文文献
中文文献
专利

1. Access to microdata on the Internet: Web-based analysis and data subset extraction tools. [J] . Chung K, Mullner R, Yang D Journal of medical systems . 2002,第6期

机译：在Internet上访问微数据：基于Web的分析和数据子集提取工具。
2. Access to microdata on the Internet: Web-based analysis and data subset extraction tools. [J] . Chung K, Mullner R, Yang D Journal of medical systems . 2002,第6期

机译：在Internet上访问微数据：基于Web的分析和数据子集提取工具。
3. Access to Microdata on the Internet: Web-Based Analysis and Data Subset Extraction Tools [J] . Kyusuk Chung, Ross Mullner, Duckhye Yang Journal of Medical Systems . 2002,第6期

机译：在Internet上访问微数据：基于Web的分析和数据子集提取工具
4. OWDEAH: Online Web Data Extraction Based on Access History [C] . Zhao Li, Wee-Keong Ng, Kok-Leong Ong Data Warehousing and Knowledge Discovery . 2004

机译：OWDEAH：基于访问历史记录的在线Web数据提取
5. Web-based access to a data warehouse of administrative data. [D] . Liu, Xiaodong. 1999

机译：基于Web的访问管理数据的数据仓库。
6. Web-based decision support system tools: The Soil and Water Assessment Tool Online visualization and analyses (SWATOnline) and NASA earth observation data downloading and reformatting tool (NASAaccess) [O] . Spencer McDonald, Ibrahim Nourein Mohammed, John D. Bolten, -1

机译：基于网络的决策支持系统工具：土壤和水评估工具在线可视化和分析（SWATOnline）以及NASA地球观测数据下载和重新格式化工具（NASAaccess）
7. Web-based closed-domain data extraction on online advertisements [O] . Maria S. Pera, Rani Qumsiyeh, Yiu-kai Ng 2011

机译：在线广告上基于Web的封闭域数据提取

OWDEAH: Online Web Data Extraction Based on Access History

摘要

著录项

相似文献

相关主题

期刊订阅