【24h】

Extraction of Meaningful Tables from the Internet Using Decision Trees

机译:使用决策树从互联网提取有意义的表

获取原文

摘要

The information retrieval system currently in use fails to consider the structural information of documents but uses extracted indexes from documents instead. Structural information such as the font face, font size, indentation, tables, and etc. demonstrate the author's meaning and is clearly the prime means of documentation. This paper pays special attention to tables because tables are commonly used within many documents to make the meanings clear, which are well recognized because web documents use tags for additional information. On the Internet, tables are used for the purpose of the structure of knowledge and also the design of documents. This report will propose a method of extracting meaningful tables using a decision tree and to construct a dictionary of table indexes in order to apply an information retrieval system and thus enhance the accuracy.
机译:目前正在使用中的信息检索系统无法考虑文档的结构信息,但使用从文档中提取的索引。字体面,字体大小,缩进,表等的结构信息展示了作者的含义,显然是文档的主要原因。本文对表格表示特别关注,因为桌子通常在许多文件中使用,以使含义清晰,这很清楚,因为Web文档使用标签进行其他信息。在互联网上,表格用于知识结构以及文件的设计。本报告将提出使用决策树提取有意义的表的方法,并构建表索引的字典,以便应用信息检索系统,从而提高准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号