首页> 中文期刊>郑州大学学报(理学版) >嵌套数据记录列表页的Web信息抽取

嵌套数据记录列表页的Web信息抽取

     

摘要

On the basis of the existing algorithms of the nested data, the data mining algorithm was joined. According to the tag trees of constructed nested list pages, all data regions were found and unified handled. Then a global pattern was produced after all the subtrees were matched based on partial tree aligning algorithm. And all the data records were extracted. Compared with the original algorithm, the efficiency was improved by using the new method, and it ensured the accuracy.%在已有嵌套数据挖掘算法的基础上,加人了数据区域挖掘算法,根据构造出的嵌套数据列表页的标签树,找出所有的数据区域,再对数据区域进行统一处理,对所有子树应用部分树对齐算法进行匹配,生成全局模式,进而抽取出所有数据记录.与原算法相比,改进后的算法在确保准确性的基础上,有效地提高了原算法在处理多数据区域时的效率.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号