首页> 外文会议>International Conference on ICT for Smart Society >Defined entity extraction based on Indonesian text document
【24h】

Defined entity extraction based on Indonesian text document

机译:基于印尼文本文档的定义的实体提取

获取原文

摘要

Entity Extraction basically is a part of process to extract document from unstructured metadata text documents. It is important to know whether the words stated in some documents are useful and contains of important information. With the growth of technology including website and internet, some involved in how semantic and technical challenged to make entity extraction much more efficient. In this case there are several tools that complied with existing name finder extraction. OpenNLP plays a good instrument to imply. Extracting entities such as person names, location and organization become terminology to defined the field of entity extraction. In generating the model for training set, Indonesian articles and documents need to be plenty and diverse so those entity easily to know exactly how to differentiate each other entities. There are several problems that necessary to minimize such as accuracy and efficiency. Percentage of word inside training set also need to have more custom and unique sentence. The result shown will be based on training set and the model generated. Mainly whole articles are in Indonesian language and this is not yet created in OpenNLP models.
机译:实体提取基本上是从非结构化元数据文本文档中提取文档的过程的一部分。重要的是要知道某些文件中的词语是否有用并且包含重要信息。随着网站和互联网等技术的发展,其中一些涉及语义和技术方面的挑战,以使实体提取更加高效。在这种情况下,有几种工具与现有的名称查找器提取兼容。 OpenNLP发挥了很好的暗示作用。提取实体(例如人员姓名,位置和组织)成为定义实体提取领域的术语。在生成训练集模型时,印度尼西亚的文章和文档需要足够多且多样,以便那些实体容易准确地知道如何区分其他实体。有一些必须最小化的问题,例如准确性和效率。训练集中的单词百分比还需要具有更多的自定义和唯一句子。显示的结果将基于训练集和生成的模型。主要是整篇文章都是印度尼西亚语的,尚未在OpenNLP模型中创建。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号