首页> 外国专利> METHOD AND DEVICE FOR CRAWLING WEBSITE DATA, STORAGE MEDIUM AND SERVER

METHOD AND DEVICE FOR CRAWLING WEBSITE DATA, STORAGE MEDIUM AND SERVER

机译：检索网站数据，存储介质和服务器的方法和装置

页面导航

摘要
著录项
相似文献

摘要

Disclosed are a method and device for crawling website data, a computer-readable storage medium and a server, solving the problem that many websites require input of a verification code to block a crawler system, resulting in the crawler system not being able to crawl data. The method provided by the present application comprises: initiating an access request to a target website of which data is to be crawled; receiving feedback information that the target website requires input of a verification code, and then acquiring a target verification code picture, on the target website, corresponding to feedback information; putting the target verification code picture into a pre-trained machine learning model for recognition, to obtain a verification code answer output by the machine learning model; executing, according to the output verification code answer, a verification operation of the target website requiring input of a verification code; and when verification of the target website is passed, crawling data from the target website.

机译：公开了一种爬网网站数据的方法和装置，计算机可读存储介质和服务器，解决了许多网站需要输入验证码来阻止爬网系统的问题，导致爬网系统无法爬网数据。。本申请提供的方法包括：发起对要抓取数据的目标网站的访问请求;接收目标网站需要输入验证码的反馈信息，然后在目标网站上获取与反馈信息相对应的目标验证码图片;将目标验证码图片放入预先训练的机器学习模型中进行识别，以获取机器学习模型输出的验证码答案;根据输出的验证码答案，对需要输入验证码的目标网站进行验证操作;当通过目标网站的验证时，从目标网站抓取数据。

著录项

公开/公告号WO2019136960A1

专利类型
公开/公告日2019-07-18

原文格式PDF
申请/专利权人 ONE CONNECT SMART TECHNOLOGY CO. LTD. (SHENZHEN);
展开▼

申请/专利号WO2018CN97499
发明设计人 LI CHENGUANG;WANG PAN;
展开▼

申请日2018-07-27
分类号G06F17/30;
国家 WO
入库时间 2022-08-21 11:53:55

相似文献

专利
外文文献
中文文献