【24h】

Mining Table Information on the Internet

机译:互联网上的挖掘表信息

获取原文

摘要

Making HTML documents, the authors use various methods for clearly conveying their intension. In those various methods, this paper pays special attention to tables because tables are commonly used within many documents to make the meanings clear, which are well recognized because web documents use tags for additional information. On the Internet, tables are used for the purpose of the knowledge structuring as well as design of documents. Thus, we are firstly interested in classifying tables into two types: meaningful tables and decorative tables. However, this is not easy because HTML does not separate presentation and structure. This paper proposes a method of extracting meaningful tables using a modified k-means and compares it with other methods. The experiment results show that classifying on web documents is promising.
机译:制作HTML文件,作者使用各种方法来清楚地传达其内涵。 在这些各种方法中,本文对表格表示特别关注表,因为表通常在许多文档中使用,以使含义清晰,这很清楚,因为Web文档使用标签以获取其他信息。 在互联网上,表格用于知识结构的目的以及文档的设计。 因此,我们首先对分类表分为两种类型:有意义的表和装饰表。 但是,这并不容易,因为HTML不分隔演示和结构。 本文提出了一种使用改进的k型方式提取有意义表的方法,并将其与其他方法进行比较。 实验结果表明,在Web文件上进行分类是有前途的。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号