首页> 外文会议>Brazilian Symposium on Neural Networks >Automatic Information Extraction in Semi-structured Official Journals
【24h】

Automatic Information Extraction in Semi-structured Official Journals

机译:半结构性官方期刊中的自动信息提取

获取原文

摘要

Information extraction systems are used to extract only relevant text information in digital repositories. The current work proposes an automatic system to extract information in semi-structured official journals. In our approach, given an input document, a Machine Learning (ML) algorithm classifies the document’s fragments into class labels which correspond to the data fields to be extracted. The implemented system deployed different features sets and algorithms used in the classification of the fragments. The system was evaluated through experiments on a sample containing 22770 lines of the Pernambuco’s Official Journal. The experiments performed revealed, in general, good results in terms of precision, which ranged from 70.14% to 98.63% depending on the feature set and algorithm used in the classification of the fragments.
机译:信息提取系统用于仅在数字存储库中提取相关文本信息。目前的工作提出了一种自动系统,以提取半结构性官方期刊中的信息。在我们的方法中,给定输入文档,机器学习(ML)算法将文档的片段分类为类标签,该类标签对应于要提取的数据字段。实现的系统部署了在分类分类中的不同功能集和算法。该系统通过实验评估含有22770行的Pernambuco官方期刊的样本。一般来说,在精度方面进行了揭示的实验,这取决于分类中使用的特征集和算法的70.14%至98.63%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号