首页> 外文OA文献 >Semi-automatic wrapper generation for semi-structured websites
【2h】

Semi-automatic wrapper generation for semi-structured websites

机译:半结构化网站的半自动包装器生成

摘要

Many information sources on the Web are semi-structured; hence there is an opportunity for automatic tools to process and extract their information for easy access through a uniform interface language. Wrapper generation is the creation of wrappers which contains scripts that extract and integrate data from data sources, mostly from Web data sources due to the large amount of data available on the World Wide Web. Despite ongoing efforts to automate the process of wrapper generation, wrappers frequently break due to formatting and layout changes in data sources. This thesis presents Wrapster, a new system that semi-automatically generates wrappers for semi-structured Web sources, improves wrapper robustness, and eliminates the need for programming skills and, to a large extent, the process of script creation. Wrapster's novel component is the repairing module that constantly checks if any wrapper script has failed and repairs the failing wrapper's script using stored extracted instances. In addition, Wrapster provides an interactive Web user interface to control the wrapper generation process, edit the generated wrappers, and test their scripts. Wrapster is being tested on the START Question Answering system; however, it is a generic tool to be used by any QA system that uses the Web as its knowledge base.
机译:Web上的许多信息源都是半结构化的。因此,自动工具就有机会处理和提取其信息,以便通过统一的界面语言轻松访问。包装器的生成是包装器的创建,其中包含脚本,这些脚本从数据源中提取和集成数据,这些数据主要是从Web数据源中提取的,这是由于万维网上可用的大量数据。尽管一直在努力使包装器生成过程自动化,但是包装器经常由于数据源的格式和布局更改而中断。本文介绍了Wrapster,这是一个新的系统,它可以半自动为半结构化Web源生成包装器,提高包装器的健壮性,并消除了对编程技能的需求,并且在很大程度上消除了脚本创建过程。 Wrapster的新颖组件是修复模块,该模块不断检查是否有任何包装器脚本失败,并使用存储的提取实例来修复失败的包装器脚本。此外,Wrapster提供了一个交互式Web用户界面,以控制包装器的生成过程,编辑所生成的包装器并测试其脚本。 Wrapster正在START Question Answering系统上进行测试;但是,它是任何以Web为知识库的QA系统都可以使用的通用工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号