首页> 外文期刊>ACM transactions on Asian language information processing >Chinese Open Relation Extraction and Knowledge Base Establishment
【24h】

Chinese Open Relation Extraction and Knowledge Base Establishment

机译:中文开放关系抽取与知识库建立

获取原文
获取原文并翻译 | 示例
       

摘要

Named entity relation extraction is an important subject in the field of information extraction. Although many English extractors have achieved reasonable performance, an effective system for Chinese relation extraction remains undeveloped due to the lack of Chinese annotation corpora and the specificity of Chinese linguistics. Here, we summarize three kinds of unique but common phenomena in Chinese linguistics. In this article, we investigate unsupervised linguistics-based Chinese open relation extraction (ORE), which can automatically discover arbitrary relations without any manually labeled datasets, and research the establishment of a large-scale corpus. By mapping the entity relations into dependency-trees and considering the unique Chinese linguistic characteristics, we propose a novel unsupervised Chinese ORE model based on Dependency Semantic Normal Forms (DSNFs). This model imposes no restrictions on the relative positions among entities and relationships and achieves a high yield by extracting relations mediated by verbs or nouns and processing the parallel clauses. Empirical results from our model demonstrate the effectiveness of this method, which obtains stable performance on four heterogeneous datasets and achieves better precision and recall in comparison with several Chinese ORE systems. Furthermore, a large-scale knowledge base of entity and relation, called COER, is established and published by applying our method to web text, which conquers the trouble of lack of Chinese corpora.
机译:命名实体关系提取是信息提取领域的重要课题。尽管许多英语提取器已经取得了合理的性能,但是由于缺乏中文注释语料库和汉语语言学的特殊性,有效的中文关系提取系统仍未开发。在这里,我们总结了中国语言学中的三种独特但普遍的现象。在本文中,我们研究了基于无监督语言的中文开放关系提取(ORE),它可以自动发现任意关系,而无需任何手动标记的数据集,并研究了大型语料库的建立。通过将实体关系映射到依赖树并考虑独特的中文语言特性,我们提出了一种基于依赖语义范式(DSNF)的新型无监督中文ORE模型。该模型对实体和关系之间的相对位置没有任何限制,并且通过提取动词或名词介导的关系并处理平行从句来实现高收益。我们模型的经验结果证明了该方法的有效性,与几种中国ORE系统相比,该方法在四个异构数据集上均具有稳定的性能,并具有更好的精度和召回率。此外,通过将我们的方法应用于网络文本,建立并发布了称为COER的大规模实体和关系知识库,从而解决了中文语料库不足的麻烦。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号