首页> 外文会议>Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >The Unreasonable Effectiveness of Word Representations for Twitter Named Entity Recognition
【24h】

The Unreasonable Effectiveness of Word Representations for Twitter Named Entity Recognition

机译:单词表示对Twitter命名实体识别的不合理有效性

获取原文

摘要

Named entity recognition (NER) systems trained on newswire perform very badly when tested on Twitter. Signals that were reliable in copy-edited text disappear almost entirely in Twitter's informal chatter, requiring the construction of specialized models. Using well-understood techniques, we set out to improve Twitter NER performance when given a small set of annotated training tweets. To leverage unlabeled tweets, we build Brown clusters and word vectors, enabling generalizations across distributionally similar words. To leverage annotated newswire data, we employ an importance weighting scheme. Taken all together, we establish a new state-of-the-art on two common test sets. Though it is well-known that word representations are useful for NER, supporting experiments have thus far focused on newswire data. We emphasize the effectiveness of representations on Twitter NER, and demonstrate that their inclusion can improve performance by up to 20 F1.
机译:在Twitter上进行测试时,在新闻专线上训练的命名实体识别(NER)系统的性能非常差。在复制编辑的文本中可靠的信号几乎完全在Twitter的非正式聊天中消失,这需要构建专门的模型。使用众所周知的技术,当获得一小组带注释的培训推文时,我们着手改善Twitter NER的性能。为了利用未标记的推文,我们构建了布朗聚类和词向量,从而可以在分布相似的词上进行概括。为了利用带注释的新闻专线数据,我们采用了重要性加权方案。综上所述,我们在两个常见的测试装置上建立了新的技术水平。尽管众所周知,单词表示对NER很有用,但到目前为止,支持性实验都集中在新闻专线数据上。我们强调Twitter NER上的表示形式的有效性,并证明将其包括在内最多可以提高20个F1的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号