首页> 外国专利> Building of a web corpus with the help of a reference web crawl

Building of a web corpus with the help of a reference web crawl

机译：借助参考Web爬网构建Web语料库

页面导航

摘要
著录项
相似文献

摘要

Computer-implemented method for building a web corpus (WCD) comprising the steps of: - sending by a web crawler (WC) a query to a reference web crawl agent (RWCA), this query containing a least one identifier of a resource, - receiving by the web crawler (WC) a response from the reference web crawl agent (RWCA); - if this response does not contain the resource identified by the identifier, downloading by the web crawler (WC) the resource from the website (WS) corresponding to the identifier and adding the resource to the web corpus (WCD; and - if this response contains the resource identified by the identifier, adding the resource to the web corpus (WCD).

机译：建立Web语料库（WCD）的计算机实现的方法，包括以下步骤：-网络搜寻器（WC）向参考网络搜寻代理（RWCA）发送查询，该查询包含至少一个资源标识符，-网络搜寻器（WC）从参考网络搜寻代理（RWCA）接收响应;-如果此响应不包含由标识符标识的资源，则由网络爬虫（WC）从与标识符相对应的网站（WS）下载资源，并将该资源添加到网络语料库（WCD）;以及-如果此响应包含标识符标识的资源，则将该资源添加到Web语料库（WCD）。

著录项

公开/公告号EP2650802B1

专利类型
公开/公告日2018-10-24

原文格式PDF
申请/专利权人 DASSAULT SYSTÈMES;
展开▼

申请/专利号EP20120305432
发明设计人 RICHARD SEBASTIEN YVON;GREHANT XAVIER;FERENCZI JIM;
展开▼

申请日2012-04-12
分类号G06F17/30;
国家 EP
入库时间 2022-08-21 13:20:05

相似文献

专利
外文文献
中文文献