首页> 中文期刊> 《计算机应用与软件》 >半结构化数据的形式化描述及数据抽取方法研究

半结构化数据的形式化描述及数据抽取方法研究

         

摘要

半结构化数据的形式化描述和信息抽取是解决用户查询和信息获取的核心问题.随着信息资源的多样化和快速膨胀,现有的描述和抽取方法存在召回率和查准率低等缺陷.为解决此问题,提出一种新的半结构数据形式化描述方法,重新定义领域概念集和领域知识集,并在此基础上给出领域概念集、领域知识集的构建过程,包括领域概念的自动抽取、领域知识集关系自动构建和相似度算法描述.实验结果表明,所提出的描述方法比现有方法具有更高召回率和查准率,具有很好的可行性和有效性.%Formal description and data extraction of semi-structured data are the core issues in solving user query and information access. Along with the information resources diversification and rapid expansion, existing description and extraction method have the defects in low recall rate and precision rate. In order to solve them, a new formal description method of semi-structure data is provided in this paper, the domain concept set and the domain knowledge set is redefined. Based on it, the construction process of domain concept set and domain knowledge set are given, including domain concept automatic extraction, domain knowledge sets automatic construction and the similarity algorithm description. Experimental results show that the proposed method has higher recall and precision than the existing method, and has very good feasibility and validity.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号