首页> 中文期刊>电子设计工程 >基于单DOM树特征预分类的自适应Web信息抽取方法

基于单DOM树特征预分类的自适应Web信息抽取方法

     

摘要

In traditional public opinion, mostly based on the template in acquisition mode, based on the reduction of artificial maintenance purposes, we propose a method based on adaptive Web information extraction single DOM tree features pre -classification, divided into the pre -classification and information extraction link two parts. Links presorting using SVM classification algorithm to extract information about hyperlinks in the pages of features to classify learning, then the results of the classification homologous Web information extraction. Experimental results show that this method of pre-classification accuracy rate of 94.48%, the recall rate was 94.77%.%在传统的舆情中多为基于模板采集模式,基于减少人工维护的目的,文中提出一种基于单DOM树特征预分类的自适应Web信息抽取方法,分为链接预分类与信息抽取两个部分.链接预分类采用SVM分类算法,提取信息超链接在页面中的特征进行分类学习,再对分类结果进行同源的Web信息提取.实验表明,此方法预分类结果准确率可达94.48%,召回率为94.77%.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号