首页> 外文会议>9th International conference on language resources and evaluation >N~3 - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format
【24h】

N~3 - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format

机译:N〜3-NLP交换格式中用于命名实体识别和消歧的数据集的集合

获取原文

摘要

Extracting Linked Data following the Semantic Web principle from unstructured sources has become a key challenge for scientific research. Named Entity Recognition and Disambiguation are two basic operations in this extraction process. One step towards the realization of the Semantic Web vision and the development of highly accurate tools is the availability of data for validating the quality of processes for Named Entity Recognition and Disambiguation as well as for algorithm tuning. This article presents three novel, manually curated and annotated corpora (N~3). All of them are based on a free license and stored in the NLP Interchange Format to leverage the Linked Data character of our datasets.
机译:遵循语义网原理从非结构化源中提取链接数据已成为科学研究的主要挑战。命名实体识别和消歧是此提取过程中的两个基本操作。实现语义Web愿景和开发高度精确的工具的第一步是提供数据,以验证命名实体识别和歧义消除以及算法调整的过程质量。本文介绍了三种新颖的,手动管理和注释的语料库(N〜3)。它们全部基于免费许可证,并以NLP交换格式存储,以利用我们数据集的链接数据字符。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号