首页> 外文会议>International Conference on Conceptual Modeling >Automatic Hidden-Web Table Interpretation by Sibling Page Comparison
【24h】

Automatic Hidden-Web Table Interpretation by Sibling Page Comparison

机译:通过兄弟姐妹页面比较自动隐藏网表解释

获取原文

摘要

The longstanding problem of automatic table interpretation still illudes us. Its solution would not only be an aid to table processing applications such as large volume table conversion, but would also be an aid in solving related problems such as information extraction and semi-structured data management. In this paper, we offer a conceptual modeling solution for the common special case in which so-called sibling pages are available. The sibling pages we consider are pages on the hidden web, commonly generated from underlying databases. We compare them to identify and connect nonvarying components (category labels) and varying components (data values). We tested our solution using more than 2,000 tables in source pages from three different domains-car advertisements, molecular biology, and geopolitical information. Experimental results show that the system can successfully identify sibling tables, generate structure patterns, interpret tables using the generated patterns, and automatically adjust the structure patterns, if necessary, as it processes a sequence of hidden-web pages. For these activities, the system was able to achieve an overall F-measure of 94.5%.
机译:自动表解释的长期问题仍然是我们的谎言。它的解决方案不仅是表处理应用的辅助,例如大批量表转换,而且还可以帮助解决相关问题,例如信息提取和半结构化数据管理。在本文中,我们为公共特殊情况提供了一个概念建模解决方案,其中可以使用所谓的兄弟姐妹页面。我们考虑的兄弟网页是隐藏网上的页面,通常由底层数据库生成。我们将它们进行比较,以识别和连接非奇形组件(类别标签)和不同的组件(数据值)。我们在三个不同领域 - 汽车广告,分子生物学和地缘政治信息中使用了超过2,000个表的解决方案。实验结果表明,系统可以成功识别兄弟表,生成结构图案,使用所生成的模式来解释表,如果需要,可以自动调整结构模式,因为它处理一系列隐藏网页。对于这些活动,该系统能够达到94.5%的整体F措施。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号