首页> 外国专利> SYSTEMS AND METHODS OF HANDLING INTERNET SPIDERS

SYSTEMS AND METHODS OF HANDLING INTERNET SPIDERS

机译:处理互联网蜘蛛的系统和方法

摘要

Aspects relate to identifying Internet spiders with an approach involving a plurality of instances of one or more URLs, which reference resources available from a first domain. Instances of the URLs are distributed at other Internet domains. Spiders crawling those domains will activate those URL instances, resulting in requests for the resources referenced by the URLs. A generator of a number of requests for the same resource, from a potential multitude of URL instances, can cause the generator to be categorized as a spider. Similarly, a generator of a number of requests for resources identified by different URLs also can be categorized as spider behavior. In some cases, the first domain may not have a browseable site infrastructure with, such that a spider would not readily crawl it by following internal links. The URLs can refer to custom queries created by various users, who can provide the URLs on their pages, such as on social networking sites.
机译:各个方面涉及利用涉及一个或多个URL的多个实例的方法来识别因特网蜘蛛,所述URL参考可从第一域获得的资源。 URL的实例分布在其他Internet域。爬网这些域的蜘蛛将激活这些URL实例,从而导致对URL引用的资源的请求。潜在的多个URL实例对同一资源的许多请求的生成器可能导致生成器被分类为蜘蛛。类似地,由不同URL标识的对资源的许多请求的生成器也可以归类为蜘蛛行为。在某些情况下,第一个域可能没有可浏览的站点基础结构,以使蜘蛛无法通过跟踪内部链接轻松地对其进行爬网。 URL可以引用由各种用户创建的自定义查询,这些用户可以在其页面(例如社交网站)上提供URL。

著录项

  • 公开/公告号US2011055400A1

    专利类型

  • 公开/公告日2011-03-03

    原文格式PDF

  • 申请/专利权人 JAMES ALEXANDER;

    申请/专利号US20100847077

  • 发明设计人 JAMES ALEXANDER;

    申请日2010-07-30

  • 分类号G06F15/173;

  • 国家 US

  • 入库时间 2022-08-21 18:11:36

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号