首页> 外国专利> METHOD, APPARATUS AND SYSTEM FOR EXTRACTING WEBPAGE CONTENT

METHOD, APPARATUS AND SYSTEM FOR EXTRACTING WEBPAGE CONTENT

机译:提取网页内容的方法,装置和系统

摘要

The present disclosure relates to a method, an apparatus and a system for extracting webpage content. The method for extracting webpage content includes: responding to a webpage browsing instruction triggered on a browser by a mobile client to obtain a corresponding webpage; parsing the webpage to obtain a DOM node of a tag in a webpage script; obtaining a plug-in tag node from the DOM node; and when a plug-in tag corresponding to the plug-in tag node is a predetermined type tag, extracting a plug-in resource that corresponds to the plug-in tag. The present disclosure can complete extracting of content that complies with a specific protocol specification when a webpage has not been truly rendered, thereby improving a speed of extracting predetermined webpage content and also improving a webpage display speed. In addition, because this solution can implement extracting of a plug-in resource on the side of a browser terminal without relying on a background server, this solution is technically easy for implementation and can reduce development costs.
机译:本发明涉及一种用于提取网页内容的方法,装置和系统。所述提取网页内容的方法,包括:响应于移动客户端在浏览器上触发的网页浏览指令,以获取对应的网页;解析网页,以获取网页脚本中标签的DOM节点;从DOM节点获取插件标签节点;当所述插件标签节点对应的插件标签为预定类型标签时,提取所述插件标签对应的插件资源。当尚未真正呈现网页时,本公开可以完成符合特定协议规范的内容的提取,从而提高提取预定网页内容的速度,并且还提高网页显示速度。另外,因为该解决方案可以在不依赖于后台服务器的情况下在浏览器终端侧实现插件资源的提取,所以该解决方案在技术上易于实现并且可以降低开发成本。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号