首页> 外文会议>International Conference on Innovative and Creative Information Technology >Information extraction on novel text using machine learning and rule-based system
【24h】

Information extraction on novel text using machine learning and rule-based system

机译:基于机器学习和规则系统的新文本信息提取

获取原文

摘要

Novel consists of around 30,000 to 50,000 words in total. It usually tells a story about entities and its relation one another such as, Person, Location or Organization. In order to apprehend those information, reading the whole novel is compulsory. However, it is a time-consuming task. This research proposes a solution - automatic extraction of entity relation by means of Information Extraction (IE) technique. This technique is divided into two steps. First, all the entities are retrieved from the text input, by using Named Entity Recognition (NER). Afterward, all relations is extracted by Relation Extraction (RE) process. This research implements an IE system to both NER and RE, which employs supervised machine learning approach combined with rule-based system. The main purpose of this research is to determine which features and algorithm of the machine learning are adequate to acquire the best result, and which rules are the most suitable for novel characteristics.
机译:小说总共包括约30,000到50,000字。它通常讲述一个关于实体的故事及其关系,例如人,地点或组织。为了逮捕这些信息,阅读整个小说是强制性的。但是,这是一个耗时的任务。本研究提出了一种解决方案 - 通过信息提取(IE)技术来自动提取实体关系。该技术分为两个步骤。首先,通过使用命名实体识别(ner)从文本输入中检索所有实体。之后,所有关系都是通过关系提取(重新)过程提取。该研究实现了一个IE系统,包括监督机器学习方法与基于规则的系统一起使用。本研究的主要目的是确定机器学习的哪些特征和算法足以获取最佳结果,并且哪种规则最适合新颖的特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号