首页> 外文会议>Conference on empirical methods in natural language processing >Twitter at the Grammys: A Social Media Corpus for Entity Linking and Disambiguation
【24h】

Twitter at the Grammys: A Social Media Corpus for Entity Linking and Disambiguation

机译:在格莱姆斯的推特:一个社交媒体语料库,用于实体链接和消歧

获取原文

摘要

Work on cross document coreference resolution (CDCR) has primarily focused on news articles, with little to no work for social media. Yet social media may be particularly challenging since short messages provide little context, and informal names are pervasive. We introduce a new Twitter corpus that contains entity annotations for entity clusters that supports CDCR. Our corpus draws from Twitter data surrounding the 2013 Grammy music awards ceremony, providing a large set of annotated tweets focusing on a single event. To establish a baseline we evaluate two CDCR systems and consider the performance impact of each system component. Furthermore, we augment one system to include temporal information, which can be helpful when documents (such as tweets) arrive in a specific order. Finally, we include annotations linking the entities to a knowledge base to support entity linking.
机译:关于跨文档Coreference解析(CDCR)的工作主要集中在新闻文章上,几乎没有为社交媒体工作。然而,由于短消息提供很少的背景,社交媒体可能特别具有挑战性,并且非正式名称是普遍存在的。我们介绍了一个新的Twitter语料库,其中包含支持CDCR的实体群集的实体注释。我们的语料库从2013年格莱美音乐颁奖典礼周围的Twitter数据中借鉴了一大一套专注于单一事件的注释推文。建立基线我们评估两个CDCR系统,并考虑每个系统组件的性能影响。此外,我们增强了一个系统以包括时间信息,当文档(例如推文)以特定顺序到达时,这可能会有所帮助。最后,我们包括将实体链接到知识库以支持实体链接的注释。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号