首页> 外文期刊>Information Technology Journal >Research on Extraction Methods of Web Page's Document Logical Structure
【24h】

Research on Extraction Methods of Web Page's Document Logical Structure

机译:网页文档逻辑结构提取方法研究

获取原文
获取原文并翻译 | 示例
           

摘要

Based on the analysis of characteristics of web page data set and difficulties of document logical structure extraction task, the method of document logical structure extraction of web page is proposed, moreover, four key technologies are proposed in order to extract document logical structure. Finally, the study download and process a number of web pages from Baidu baike and general sites related to two courses of computer science i.e., operating system and computer network. Evaluation on web pages of Baidu baike shows that the average error rate is 12.8 and 6.6% on operating system and computer network courses respectively and the average rate of general web pages on operating system and computer network is 30 and 22.6%, respectively. The experimental results validate the effectiveness of the method proposed in this study.
机译:在分析了网页数据集特征和文档逻辑结构提取任务的难点的基础上,提出了网页文档逻辑结构提取的方法,并提出了四种关键技术来提取文档逻辑结构。最后,该研究从百度百科和与计算机科学两门课程(即操作系统和计算机网络)相关的一般站点下载并处理了许多网页。百度百科网页评估显示,操作系统和计算机网络课程的平均错误率分别为12.8和6.6%,操作系统和计算机网络的一般网页的平均错误率分别为30和22.6%。实验结果验证了该研究方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号