首页> 外国专利> METHOD AND DEVICE FOR CRAWLING WEBSITE DATA, STORAGE MEDIUM AND SERVER

METHOD AND DEVICE FOR CRAWLING WEBSITE DATA, STORAGE MEDIUM AND SERVER

机译:检索网站数据,存储介质和服务器的方法和装置

摘要

Disclosed are a method and device for crawling website data, a computer-readable storage medium and a server, solving the problem that many websites require input of a verification code to block a crawler system, resulting in the crawler system not being able to crawl data. The method provided by the present application comprises: initiating an access request to a target website of which data is to be crawled; receiving feedback information that the target website requires input of a verification code, and then acquiring a target verification code picture, on the target website, corresponding to feedback information; putting the target verification code picture into a pre-trained machine learning model for recognition, to obtain a verification code answer output by the machine learning model; executing, according to the output verification code answer, a verification operation of the target website requiring input of a verification code; and when verification of the target website is passed, crawling data from the target website.
机译:公开了一种爬网网站数据的方法和装置,计算机可读存储介质和服务器,解决了许多网站需要输入验证码来阻止爬网系统的问题,导致爬网系统无法爬网数据。 。本申请提供的方法包括:发起对要抓取数据的目标网站的访问请求;接收目标网站需要输入验证码的反馈信息,然后在目标网站上获取与反馈信息相对应的目标验证码图片;将目标验证码图片放入预先训练的机器学习模型中进行识别,以获取机器学习模型输出的验证码答案;根据输出的验证码答案,对需要输入验证码的目标网站进行验证操作;当通过目标网站的验证时,从目标网站抓取数据。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号