首页> 外文期刊>Computer standards & interfaces >Logical structure analysis: From HTML to XML
【24h】

Logical structure analysis: From HTML to XML

机译:逻辑结构分析:从HTML到XML

获取原文
获取原文并翻译 | 示例
       

摘要

This paper presents an efficient method for extracting a logical structure from a Web document. The proposed method consists of three phases: visual grouping, element identification, and logical grouping. To produce a logical structure more accurately, the proposed method defines a document model that is able to describe logical structure information of a specific document class. Since the proposed method is based on a visual structure from the visual grouping phase as well as a document model that describes logical structure information of a document type, it supports sophisticated structure analysis. Experimental results with HTML documents from the Web show that the method has performed logical structure analysis successfully, compared with previous work. Particularly, the method generates XML documents as the result of structure analysis, so that it enhances the reusability of documents.
机译:本文提出了一种从Web文档中提取逻辑结构的有效方法。所提出的方法包括三个阶段:视觉分组,元素识别和逻辑分组。为了更准确地产生逻辑结构,所提出的方法定义了一种文档模型,该文档模型能够描述特定文档类的逻辑结构信息。由于所提出的方法基于视觉分组阶段的视觉结构以及描述文档类型的逻辑结构信息的文档模型,因此它支持复杂的结构分析。通过Web上的HTML文档进行的实验结果表明,与以前的工作相比,该方法已成功执行了逻辑结构分析。特别地,该方法生成XML文档作为结构分析的结果,从而提高了文档的可重用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利