首页> 外文会议>ACM SIGMOD International Conference on Management of Data >An XML-based Wrapper Generator for Web Information Extraction
【24h】

An XML-based Wrapper Generator for Web Information Extraction

机译:用于Web信息提取的基于XML的包装器发生器

获取原文

摘要

There has been tremendous interest in information integration systems that automatically gather, manipulate, and integrate data from multiple information sources on a user's behalf. Unfortunately, web sites are primarily designed for human browsing rather than for use by a computer program. Mechanically extracting their content is in general a rather difficult job if not impossible [4]. Software systems using such web information sources typically use hand-coded wrappers to extract information content of interest from web sources and translate query responses to a more structured format (e.g., relational form) before unifying them into an integrated answer to a user's query. The most recent generation of information mediator systems (e.g., Ariadne [3], CQ [5, 7], Internet Softbots [4], TSIMMIS [2]) addresses this problem by enabling a pre-wrapped set of web sources to be accessed via database-like queries.
机译:对信息集成系统有巨大的兴趣自动收集,操作和集成来自用户代表用户的多个信息源的数据。不幸的是,网站主要用于人类浏览而不是由计算机程序使用。如果不是不可能的情况,机械提取它们的内容通常是一个相当困难的工作[4]。使用这种Web信息源的软件系统通常使用手工编码包装器来提取来自Web源的感兴趣的信息内容,并将查询响应转换为更具结构化格式(例如,关系形式),然后统一它们进入用户查询的集成答案。最近一代信息介质系统(例如,Ariadne [3],CQ [5,7],Internet Softbots [4],Tsimmis [2])通过启用要访问的预包装的网源集来解决这个问题通过类似数据库的查询。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号