【24h】

A Machine Learning Approach to Information Extraction

机译:一种机器学习的信息提取方法

获取原文
获取原文并翻译 | 示例

摘要

Information extraction is concerned with applying natural language processing to automatically extract the essential details from text documents. A great disadvantage of current approaches is their intrinsic dependence to the application domain and the target language. Several machine learning techniques have been applied in order to facilitate the portability of the information extraction systems. This paper describes a general method for building an information extraction system using regular expressions along with supervised learning algorithms. In this method, the extraction decisions are lead by a set of classifiers instead of sophisticated linguistic analyses. The paper also shows a system called TOPO that allows to extract the information related with natural disasters from newspaper articles in Spanish language. Experimental results of this system indicate that the proposed method can be a practical solution for building information extraction systems reaching an F-measure as high as 72%.
机译:信息提取与应用自然语言处理来自动从文本文档中提取基本细节有关。当前方法的一个很大的缺点是它们对应用程序域和目标语言的内在依赖。为了促进信息提取系统的便携性,已经应用了几种机器学习技术。本文介绍了一种使用正则表达式以及监督学习算法构建信息提取系统的通用方法。在这种方法中,提取决策由一组分类器而不是复杂的语言分析来主导。本文还显示了一个名为TOPO的系统,该系统允许从西班牙语的报纸文章中提取与自然灾害有关的信息。该系统的实验结果表明,所提出的方法可以作为建立F值高达72%的建筑信息提取系统的实用解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号