首页> 外文会议>Web Information System and Application Conference >Web Data Extraction Based on Visual Information and Partial Tree Alignment
【24h】

Web Data Extraction Based on Visual Information and Partial Tree Alignment

机译:基于视觉信息和局部树对齐的Web数据提取

获取原文

摘要

Web databases contain a huge amount of structured data which are easily obtained via their query interfaces only. The query results are presented in dynamically generated web pages, usually in the form of data records, for human use. The automatical web data extraction is critical in web integration. A number of approaches have been proposed. The early work are most based on the source code or the tag tree of the page. Recent approaches use the visual feature to extract data information, which are better than the previous work. However, these approaches still have inherent limitation. In this paper, we propose a novel approach that make use of visual features to extract data information from web page, including the data records and the data items. The results of this experiment tests on a large set of query result pages in different domain show that the proposed approach is highly effective.
机译:Web数据库包含大量结构化数据,这些结构化数据仅通过查询接口即可轻松获得。查询结果通常以数据记录的形式呈现在动态生成的网页中,以供人类使用。自动Web数据提取对于Web集成至关重要。已经提出了许多方法。早期的工作主要基于页面的源代码或标签树。最近的方法使用视觉功能来提取数据信息,这比以前的工作要好。但是,这些方法仍然具有固有的局限性。在本文中,我们提出了一种新颖的方法,该方法利用视觉特征从网页中提取数据信息,包括数据记录和数据项。在不同领域的大量查询结果页面上进行的实验测试结果表明,该方法非常有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号