首页> 外文会议>International Workshop on Database and Expert Systems Applications >Automatic Wrapper Generation for Semi-Structured Biological Data Based on Table Structure Identification
【24h】

Automatic Wrapper Generation for Semi-Structured Biological Data Based on Table Structure Identification

机译:基于表结构识别的半结构化生物数据自动包装器

获取原文

摘要

Biological data analyses usually require complex manipulations involving tool applications, multiple web site navigation, result selection and filtering, and iteration over the internet. Most biological data are generated from structured databases and by applications and presented to the users embedded within repeated structures, or tables, in HTML documents. In this paper we outline a novel technique for the identification of table structures in HTML documents. This identification technique is then used to automatically generate composite wrappers for applications requiring distributed resources. We demonstrate that our method is robust enough to discover standard as well as non-standard table structures in HTML documents. Thus our technique outperforms contemporary techniques used in systems such as XWrap and AutoWrapper. We discuss our technique in the context of our PickUp system that exploits the theoretical developments presented in this paper and emerges as an elegant automatic wrapper generation system.
机译:生物数据分析通常需要涉及刀具应用,多个网站导航,结果选择和过滤的复杂操作,并通过Internet进行迭代。大多数生物数据是从结构化数据库和应用程序生成的,并呈现给HTML文档中的重复结构或表中嵌入的用户。在本文中,我们概述了一种新颖的技术,用于识别HTML文档中的表结构。然后使用该识别技术自动生成用于需要分布式资源的应用程序的复合包装器。我们证明我们的方法足以发现HTML文档中的标准以及非标准表结构。因此,我们的技术优于XWrap和Autowrapper等系统中使用的当代技术。我们在我们的拾取系统中讨论了我们的技术,该技术利用本文提出的理论发展,并作为优雅的自动包装生成系统出现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号