【24h】

A Faceted Crawler for the Twitter Service

机译:Twitter服务的多面爬虫

获取原文

摘要

Researchers, nowadays, have at their disposal valuable data from social networking applications, of which Twitter and Facebook are the most prominent examples. To retrieve this content, the Twitter service provides 2 distinct Application Programming Interfaces (APIs): a probe-based and a streaming one, each of which imposes different limitations on the data collection process. In this paper, we present a general architecture to facilitate faceted crawling of the service, which simplifies retrieval. We give implementation details of our system, while providing a simple way to express the crawling process, i.e., the crawl flow. We experimentally evaluate it on a variety of faceted crawls, depicting its efficacy for the online medium.
机译:如今,研究人员可以使用社交网络应用程序中的宝贵数据,其中Twitter和Facebook是最突出的例子。为了检索此内容,Twitter服务提供2个不同的应用程序编程接口(API):基于探针的接口和基于流的接口,每个接口对数据收集过程施加不同的限制。在本文中,我们提出了一种通用的体系结构,以促进服务的多面爬网,从而简化了检索。我们提供了系统的实现细节,同时提供了一种简单的方式来表示抓取过程,即抓取流程。我们在各种多面爬虫上进行了实验评估,描述了其对在线媒体的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号