首页> 外国专利> Building of a web corpus with the help of a reference web crawl

Building of a web corpus with the help of a reference web crawl

机译:借助参考Web爬网构建Web语料库

摘要

Computer-implemented method for building a web corpus (WCD) comprising the steps of: - sending by a web crawler (WC) a query to a reference web crawl agent (RWCA), this query containing a least one identifier of a resource, - receiving by the web crawler (WC) a response from the reference web crawl agent (RWCA); - if this response does not contain the resource identified by the identifier, downloading by the web crawler (WC) the resource from the website (WS) corresponding to the identifier and adding the resource to the web corpus (WCD; and - if this response contains the resource identified by the identifier, adding the resource to the web corpus (WCD).
机译:建立Web语料库(WCD)的计算机实现的方法,包括以下步骤:-网络搜寻器(WC)向参考网络搜寻代理(RWCA)发送查询,该查询包含至少一个资源标识符,-网络搜寻器(WC)从参考网络搜寻代理(RWCA)接收响应;-如果此响应不包含由标识符标识的资源,则由网络爬虫(WC)从与标识符相对应的网站(WS)下载资源,并将该资源添加到网络语料库(WCD);以及-如果此响应包含标识符标识的资源,则将该资源添加到Web语料库(WCD)。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号