Research on Extraction Methods of Web Page's Document Logical Structure

Wei Wang; Wei Wei; Qinghua Zheng; Jie Hu; Yingying Chen; Bin Zhou

首页> 外文期刊>Information Technology Journal >Research on Extraction Methods of Web Page's Document Logical Structure

【24h】

Research on Extraction Methods of Web Page's Document Logical Structure

机译：网页文档逻辑结构提取方法研究

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Based on the analysis of characteristics of web page data set and difficulties of document logical structure extraction task, the method of document logical structure extraction of web page is proposed, moreover, four key technologies are proposed in order to extract document logical structure. Finally, the study download and process a number of web pages from Baidu baike and general sites related to two courses of computer science i.e., operating system and computer network. Evaluation on web pages of Baidu baike shows that the average error rate is 12.8 and 6.6% on operating system and computer network courses respectively and the average rate of general web pages on operating system and computer network is 30 and 22.6%, respectively. The experimental results validate the effectiveness of the method proposed in this study.

机译：在分析了网页数据集特征和文档逻辑结构提取任务的难点的基础上，提出了网页文档逻辑结构提取的方法，并提出了四种关键技术来提取文档逻辑结构。最后，该研究从百度百科和与计算机科学两门课程（即操作系统和计算机网络）相关的一般站点下载并处理了许多网页。百度百科网页评估显示，操作系统和计算机网络课程的平均错误率分别为12.8和6.6％，操作系统和计算机网络的一般网页的平均错误率分别为30和22.6％。实验结果验证了该研究方法的有效性。

著录项

来源
《Information Technology Journal》 |2014年第1期|共9页
作者
Wei Wang; Wei Wei; Qinghua Zheng; Jie Hu; Yingying Chen; Bin Zhou;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类信息处理技术;
关键词
Document logical structure; Web information extraction; Minimum semantic logical block; Optimal sequence solving;

机译：文档逻辑结构;Web信息提取;最小语义逻辑块;最优序列求解;

相似文献

外文文献
中文文献
专利

1. Research on Extraction Methods of Web Page's Document Logical Structure [J] . Wei Wang, Wei Wei, Qinghua Zheng, Information Technology Journal . 2014,第1期

机译：网页文档逻辑结构提取方法研究
2. Global-local-global method for logical structure extraction of form document image [J] . Hong zhao, Bing Liu, Zao Jiang Journal of electronic imaging . 2000,第3期

机译：表单文档图像逻辑结构提取的全局-局部-全局方法
3. Information Extraction in Unstructured Multilingual Web Documents [J] . Kolla Bhanu Prakash, M. A. Dorai Rangaswamy, T. V. Ananthan, Indian Journal of Science and Technology . 2015,第16期

机译：非结构化多语言Web文档中的信息提取
4. New method for logical structure extraction of form document image [C] . Liu Bing, Northeastern Univ., Shenyang Liaoning, Document Recognition and Retrieval VI . 1999

机译：表单文档图像逻辑结构提取的新方法
5. Understanding the Logical and Semantic Structure of Large Documents [D] . Rahman, Muhammad Mahbubur. 2018

机译：了解大文件的逻辑和语义结构
6. Extraction of a group-pair relation: problem-solving relation from web-board documents [O] . Chaveevan Pechsiri, Rapepun Piriyakul -1

机译：组对关系的提取：Web板文档中的问题解决关系
7. Research on Extraction Methods of Web Page’s Document Logical Structure [O] . Wei Wang, Wei Wei ., Qinghua Zheng ., 2013

机译：网页文档逻辑结构提取方法研究

Research on Extraction Methods of Web Page's Document Logical Structure

摘要

著录项

相似文献

相关主题

期刊订阅