首页>
外国专利>
METHOD AND DEVICE FOR CRAWLING TARGET CORPUS DATA, AND STORAGE MEDIUM
METHOD AND DEVICE FOR CRAWLING TARGET CORPUS DATA, AND STORAGE MEDIUM
展开▼
机译:检索目标语料库数据和存储介质的方法和设备
展开▼
页面导航
摘要
著录项
相似文献
摘要
Provided is a method for crawling target corpus data. The method comprises: after a crawling request for target information is received, firstly determining a crawling rule required for crawling a target corpus, and invoking the crawling rule to sequentially crawl a first title page URL list, a first list page URL list and a first content page URL list from an initial corpus; and then crawling a second list page URL list corresponding to the first title page URL list, generating a third list page URL list, crawling a second content page URL list corresponding to the third list page URL list, and generating a third content page URL list, so as to acquire content page data; and finally, using a target information crawling rule to crawl the target information, so as to generate target corpus data. Further provided are an electronic device and a computer storage medium. By means of using the above-mentioned method, the efficiency and accuracy of crawling target corpus data are improved.
展开▼