首页> 中文期刊>计算机工程 >基于知识图谱的Web信息抽取系统

基于知识图谱的Web信息抽取系统

     

摘要

为实现多领域海量网页信息的有效抽取,以中文知识图谱CN-DBpedia为基础设计Web信息抽取系统.基于知识图谱对网页数据项进行自动标注,建立具有容错能力的包装器归纳框架,从包含错误的标注集中归纳学习出正确的包装器.实验结果表明,该系统的准确率和召回率均高于传统人工标注方法,可显著降低网页信息抽取过程中的人力成本,灵活运用于大规模、多领域的网页信息抽取任务.%In order to effectively extract huge amounts of Web information in multiple fields,a Web information extraction system is designed based on Chinese knowledge graph,CN-DBpedia.Firstly,webpage data items with noise are automatically labeled based on knowledge graph.Then,correct wrappers are induced and learned from labeling sets with errors by a fault-tolerant wrapper induction framework.Experimental results demonstrate that,compared with traditional information extraction method by manual annotation,the proposed system has higher precision and recall rate.It can significantly reduce human participation during the extraction process and flexibly apply to large-scale webpage information extraction tasks in multiple fields.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号