首页> 外文会议>ICWSM 2012 >A Systematic Investigation of Blocking Strategies for Real-Time Classification of Social Media Content into Events
【24h】

A Systematic Investigation of Blocking Strategies for Real-Time Classification of Social Media Content into Events

机译:对社会媒体内容实时分类的阻塞策略的系统调查

获取原文

摘要

Events play a prominent role in our lives, such that many social media documents describe or are related to some event. Organizing social media documents with respect to events thus seems a promising approach to better manage and organize the ever-increasing amount of user-generated content in social media applications. It would support the navigation of data by events or allow one to get notified about new postings related to the events one is interested in, just to name two applications. A challenge is to automatize this process so that incoming documents can be assigned to their corresponding event without any user intervention. We present a system that is able to classify a stream of social media data into a growing and evolving set of events. In order to scale up to the data sizes and data rates in social media applications, the use of a candidate retrieval or blocking step is crucial to reduce the number of events that are considered as potential candidates to which the incoming data point could belong to. In this paper we present and experimentally compare different blocking strategies along their cost vs. effectiveness tradeoff. We show that using a blocking strategy that selects the 60 closest events with respect to upload time, we reach F-Measures of about 85.1% while being able to process the incoming documents within 32ms on average. We thus provide a principled approach supporting to scale up classification of social media documents into events and to process the incoming stream of documents in real time.
机译:事件在我们的生活中发挥着突出的作用,使得许多社交媒体文件描述或与一些事件有关。因此,组织社交媒体文档似乎是一个有希望的方法,可以更好地管理和组织社交媒体应用中不断增加的用户生成的内容。它将支持事件的数据导航,或者允许人们通知有关与事件有关的新帖子的通知,只是为了命名两个应用程序。挑战是自动化此过程,以便在没有任何用户干预的情况下将传入的文件分配给它们的相应事件。我们展示了一个系统,能够将社交媒体数据流分类为越来越多的一组事件。为了扩展社交媒体应用中的数据大小和数据速率,使用候选检索或阻塞步骤对于减少被视为进入数据点所属的潜在候选的事件的数量至关重要。在本文中,我们在实验和实验沿其成本与效率权衡进行了不同的阻塞策略。我们表明,使用一个封锁策略,即在上传时间选择60个最接近的事件,我们达到约85.1%的F尺寸,同时能够平均地在32ms内处理输入的文件。因此,我们提供了一个主要的方法,支持将社交媒体文档的分类扩展到事件中,并实时处理传入的文件流。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号