首页> 外文会议>IEEE International Conference on Cloud Computing and Big Data Analysis >Multi-language person social relation extraction model based on distant supervision
【24h】

Multi-language person social relation extraction model based on distant supervision

机译:基于遥远监督的多语言人员社会关系提取模型

获取原文

摘要

Relation extraction refers to a method of efficiently identifying entities from the text and extracting semantic relations between entities. The person social relation extraction is one of the most important fields in relation extraction. A large number of techniques have been proposed on relation extraction thus far, and supervised machine learning methods are the most widely used. However, the disadvantages of supervised machine learning methods are that manually annotating training data set is costly and time-consuming, which block the improvement of the supervised relation extraction model. Aiming at the limitation, we propose a novel person social relation extraction model on both Chinese and English corpus with distant supervision. Distant supervision method can make full use of the information in the knowledge base and provide training data without manual effort. In particular, it is an effective method in the very large corpora which contains thousands of relations. In this model, we use distant supervision method to get the weak-labeled data set. Then, a supervised method is used to train a classifier, which is expect to distinguish the relation between the person entities in the input sentence. Experiment results on real-world datasets show that, our model can take advantage of all informative sentences in knowledge base and outperforms several competitive analogous methods, what's more, it does not need any human-labeled training data.
机译:关系提取是指从文本中有效地识别实体并提取实体之间的语义关系的方法。社会关系提取人是关系提取中最重要的领域之一。迄今为止,已经提出了大量技术,并且监督机器学习方法最广泛。然而,监督机器学习方法的缺点是手动注释训练数据集是昂贵且耗时的,这阻断了监督关系提取模型的改进。旨在限制,我们提出了一种关于遥远监督的中英文语料库的新人社会关系提取模型。遥远的监督方法可以充分利用知识库中的信息,并提供无需手动努力的培训数据。特别是,这是一个有效的方法,其中包含成千上万的关系。在此模型中,我们使用远程监控方法来获得弱标记的数据集。然后,使用监督方法训练分类器,该分类器期望区分输入句子中的人员实体之间的关系。实验结果对现实世界数据集显示,我们的模型可以利用知识库中的所有内容句,优于几种竞争类似的方法,更重要的是,它不需要任何人为标记的培训数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号