首页> 外文期刊>Data & Knowledge Engineering >Event identification in web social media through named entity recognition and topic modeling
【24h】

Event identification in web social media through named entity recognition and topic modeling

机译:通过命名实体识别和主题建模在网络社交媒体中进行事件识别

获取原文
获取原文并翻译 | 示例
           

摘要

The problem of identifying important online or real life events from large textual document streams that are freely available on the World Wide Web is increasingly gaining popularity, given the flourishing of the social web. An event triggers discussion and comments on the WWW, especially in the blogosphere and in microblogging services. Consequently, one should be able to identify the involved entities, topics, time, and location of events through the analysis of information publicly available on the web, create semantically rich representations of events, and then use this information to provide interesting results, or summarize news to users. in this paper, we define the concept of important event and propose an efficient methodology for performing event detection from large time-stamped web document streams. The methodology successfully integrates named entity recognition, dynamic topic map discovery, topic clustering, and peak detection techniques. In addition, we propose an efficient algorithm for detecting all important events from a document stream. We perform extensive evaluation of the proposed methodology and algorithm on a dataset of 7 million blogposts, as well as through an international social event detection challenge. The results provide evidence that our approach: a) accurately detects important events, b) creates semantically rich representations of the detected events, c) can be adequately parameterized to correspond to different social perceptions of the event concept, and d) is suitable for online event detection on very large datasets. The expected complexity of the online facet of the proposed algorithm is linear with respect to the number of documents in the data stream.
机译:考虑到社交网络的蓬勃发展,从大型文本文档流中识别重要的在线事件或现实事件的问题越来越流行,该大型文本文档流可从万维网上免费获得。事件触发了对WWW的讨论和评论,特别是在博客圈和微博客服务中。因此,人们应该能够通过对网络上公开可用的信息进行分析来确定事件涉及的实体,主题,时间和地点,创建事件的语义丰富的表示形式,然后使用此信息提供有趣的结果或进行总结给用户的新闻。在本文中,我们定义了重要事件的概念,并提出了一种有效的方法,用于从带有时间戳的大型Web文档流中执行事件检测。该方法成功地集成了命名实体识别,动态主题图发现,主题聚类和峰值检测技术。另外,我们提出了一种用于从文档流中检测所有重要事件的有效算法。我们对700万个博客帖子的数据集以及国际社会事件检测挑战进行了广泛的评估,对所提出的方法和算法进行了评估。结果提供了我们的方法的证据:a)准确地检测重要事件,b)创建检测到的事件的语义丰富的表示形式,c)可以被适当地参数化以对应于事件概念的不同社会认知,并且d)适合在线使用大型数据集上的事件检测。所提出算法的在线方面的预期复杂度相对于数据流中的文档数量是线性的。

著录项

  • 来源
    《Data & Knowledge Engineering》 |2013年第11期|1-24|共24页
  • 作者单位

    Aristotle University of Thessaloniki, Dept. of Electrical and Computer Engineering, GR54124 Thessaloniki, Greece Information Technologies Institute, Centre for Research and Technology - Hellas, GR57001 Thessaloniki, Greece;

    Aristotle University of Thessaloniki, Dept. of Electrical and Computer Engineering, GR54124 Thessaloniki, Greece Information Technologies Institute, Centre for Research and Technology - Hellas, GR57001 Thessaloniki, Greece;

    Aristotle University of Thessaloniki, Dept. of Electrical and Computer Engineering, GR54124 Thessaloniki, Greece Information Technologies Institute, Centre for Research and Technology - Hellas, GR57001 Thessaloniki, Greece;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Event identification; Social media analysis; Topic maps; Peak detection; Topic clustering;

    机译:事件识别;社交媒体分析;主题图;峰值检测;主题聚类;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号