首页> 外文会议>International Conference on Information Technology Systems and Innovation >Information extraction in statistics indicator tables using rule generalizations and ontology
【24h】

Information extraction in statistics indicator tables using rule generalizations and ontology

机译:使用规则概括和本体在统计指标表中提取信息

获取原文

摘要

The main problem of rule-based information extraction technique is that the extraction rules tend to be specifically designed for specific information or document structure; hence it cannot be directly used in another without some proper modifications. Semi-structured documents like tables present another challenge to information extraction; since there are no standards on how to design it, the structure of the tables can be varying. Statistics indicator is a source of information that use tables as a means of data presentation. Statistics indicators also have a relationship concept that must be carefully identified and extracted. Generalization rules attempt to reduce effort in the extraction rule modification process by creating extraction rules in general terms. Combined with ontology, the rules can also extract the relationship between indicators. The output of this information extraction system is a database that keeps not only the data itself but also the relationship concept between indicators.
机译:基于规则的信息提取技术的主要问题是,提取规则倾向于针对特定的信息或文档结构进行专门设计。因此,如果不做一些适当的修改,就不能直接将其用于其他应用程序中。诸如表格之类的半结构化文档对信息提取提出了另一个挑战。由于没有关于如何设计的标准,因此表的结构可能会有所不同。统计指标是使用表作为数据表示手段的信息来源。统计指标还具有必须仔细识别和提取的关系概念。泛化规则试图通过一般性地创建提取规则来减少提取规则修改过程中的工作量。结合本体,规则还可以提取指标之间的关系。该信息提取系统的输出是一个数据库,该数据库不仅保留数据本身,还保留指标之间的关系概念。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号