...
首页> 外文期刊>Data & Knowledge Engineering >Automatic hidden-web table interpretation, conceptualization, and semantic annotation
【24h】

Automatic hidden-web table interpretation, conceptualization, and semantic annotation

机译:自动隐藏Web表解释,概念化和语义注释

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The longstanding problem of automatic table interpretation still eludes us. Its solution would not only be an aid to table processing applications such as large volume table conversion, but would also be an aid in solving related problems such as information extraction, semantic annotation, and semi-structured data management. In this paper, we offer a solution for the common special case in which so-called sibling pages are available. The sibling pages we consider are pages on the hidden web, commonly generated from underlying databases. Our system compares them to identify and connect nonvarying components (category labels) and varying components (data values). We tested our solution using more than 2000 tables in source pages from three different domains-car advertisements, molecular biology, and geopolitical information. Experimental results show that the system can successfully identify sibling tables, generate structure patterns, interpret tables using the generated patterns, and automatically adjust the structure patterns as it processes a sequence of hidden-web pages. For these activities, the system was able to achieve an overall F-measure of 94.5%. Further, given that we can automatically interpret tables, we next show that this leads immediately to a conceptualization of the data in these interpreted tables and thus also to a way to semantically annotate these interpreted tables with respect to the ontological conceptualization. Labels in nested table structures yield onto-logical concepts and interrelationships among these concepts, and associated data values become annotated information. We further show that semantically annotated data leads immediately to queriable data. Thus, the entire process, which is fully automatic, transform facts embedded within tables into facts accessible by standard query engines.
机译:自动表解释的长期存在的问题仍然困扰着我们。它的解决方案不仅将有助于表处理应用程序(例如大容量表转换),而且还将有助于解决相关问题,例如信息提取,语义注释和半结构化数据管理。在本文中,我们为常见的特殊情况提供了一种解决方案,在这种情况下,可以使用所谓的同级页面。我们认为的同级页面是隐藏网络上的页面,通常是从基础数据库生成的。我们的系统将它们进行比较,以识别并连接不变的组件(类别标签)和变化的组件(数据值)。我们使用来自三个不同领域的源页面中的2000多个表(包括汽车广告,分子生物学和地缘政治信息)测试了我们的解决方案。实验结果表明,该系统可以成功处理同级表,生成结构模式,使用生成的模式解释表,并在处理一系列隐藏网页时自动调整结构模式。对于这些活动,系统能够实现94.5%的总体F值。此外,假设我们可以自动解释表,那么接下来我们表明,这将立即导致这些解释表中数据的概念化,从而也就本体论概念化提供了一种对这些解释表进行语义注释的方法。嵌套表结构中的标签产生了本体概念和这些概念之间的相互关系,并且关联的数据值成为带注释的信息。我们进一步表明,语义注释的数据会立即导致可查询的数据。因此,整个过程是完全自动化的,它将表中嵌入的事实转换为标准查询引擎可以访问的事实。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号