【24h】

Extraction of Meaningful Tables from the Internet Using Decision Trees

机译:使用决策树从Internet提取有意义的表

获取原文
获取原文并翻译 | 示例

摘要

The information retrieval system currently in use fails to consider the structural information of documents but uses extracted indexes from documents instead. Structural information such as the font face, font size, indentation, tables, and etc. demonstrate the author's meaning and is clearly the prime means of documentation. This paper pays special attention to tables because tables are commonly used within many documents to make the meanings clear, which are well recognized because web documents use tags for additional information. On the Internet, tables are used for the purpose of the structure of knowledge and also the design of documents. This report will propose a method of extracting meaningful tables using a decision tree and to construct a dictionary of table indexes in order to apply an information retrieval system and thus enhance the accuracy.
机译:当前使用的信息检索系统没有考虑文档的结构信息,而是使用从文档中提取的索引。结构信息(如字体,字体大小,缩进,表格等)说明作者的意思,并且显然是文档的主要手段。本文特别关注表,因为表经常在许多文档中使用,以使含义更清晰,由于Web文档使用标签作为附加信息,因此表意得到了很好的认可。在Internet上,表格用于知识结构和文档设计的目的。该报告将提出一种使用决策树提取有意义的表并构建表索引字典的方法,以便应用信息检索系统,从而提高准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号