【24h】

A new method of page standardization based on DOM

机译:基于DOM的页面标准化的一种新方法

获取原文

摘要

With the rapid development of the Internet, information as well as websites boomed. And, being differentiated in style, structure or content, it is unable to get the information from different pages using the same model, while it is really a waste of time to search each line of the page to find useful information because of noises. That makes arranging all the information from a page to build a DOM tree for search a wise choice firstly because it raises the possibility of searching accurately. What is more, converting a web page into a tree helps identify the main frame of the page. On the other hand, unreadable codes, which are caused by invalid transformation between languages, is a barrier separating people apart from information on websites of other districts of the world. Our work is aimed at solving the listed problems to make information from all around the world accessible while convenient to extract.
机译:随着互联网的快速发展,信息以及网站蓬勃发展。并且,在样式,结构或内容中进行区分,它无法使用相同的模型从不同页面获取信息,而在搜索页面的每一行以寻找有用的信息是真的浪费时间,以找到有用的信息。这使得从页面中安排所有信息来构建DOM树,首先搜索明智的选择,因为它提出了准确搜索的可能性。更重要的是,将网页转换为树,有助于识别页面的主框架。另一方面,由语言之间无效的转换引起的不可读的代码是一个障碍分离人员,除了世界其他地区的网站上的信息。我们的工作旨在解决上市问题,以便在世界各地的所有人提供信息,同时方便提取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号