首页> 中文期刊> 《模式识别与人工智能》 >结合从句级远程监督与半监督集成学习的关系抽取方法

结合从句级远程监督与半监督集成学习的关系抽取方法

     

摘要

Aiming at noisy data in training data and the insufficient use of negative instances in traditional distant supervision relation extraction methods, a relation extraction method combining clause level distant supervision and semi-supervised ensemble learning is proposed. Firstly, the relation instance set is generated by distant supervision. Secondly, based on clause identification, a denoising algorithm is used to reduce the wrongly labeled data in the relation instance set. Thirdly, the lexical features are extracted from relation instances and are transformed into distributed vectors to establish feature dataset. Finally, all positive data and part of negative data in feature dataset are chosen to form labeled dataset, and the other part of negative data are chosen to form unlabeled dataset. A relation classifier is trained through improved semi-supervised ensemble learning algorithm. Experiments show that compared with baseline methods the proposed method achieves higher accuracies and recall.%针对传统基于远程监督的关系抽取方法中存在噪声和负例数据利用不足的问题,提出结合从句级远程监督和半监督集成学习的关系抽取方法.首先通过远程监督构建关系实例集,使用基于从句识别的去噪算法去除关系实例集中的噪声.然后抽取关系实例的词法特征并转化为分布式表征向量,构建特征数据集.最后选择特征数据集中所有正例数据和部分负例数据组成标注数据集,其余的负例数据组成未标注数据集,通过改进的半监督集成学习算法训练关系分类器.实验表明,相比基线方法,文中方法可以获得更高的分类准确率和召回率.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号