The Unreasonable Effectiveness of Word Representations for Twitter Named Entity Recognition

机译：单词表示对Twitter命名实体识别的不合理有效性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Named entity recognition (NER) systems trained on newswire perform very badly when tested on Twitter. Signals that were reliable in copy-edited text disappear almost entirely in Twitter's informal chatter, requiring the construction of specialized models. Using well-understood techniques, we set out to improve Twitter NER performance when given a small set of annotated training tweets. To leverage unlabeled tweets, we build Brown clusters and word vectors, enabling generalizations across distributionally similar words. To leverage annotated newswire data, we employ an importance weighting scheme. Taken all together, we establish a new state-of-the-art on two common test sets. Though it is well-known that word representations are useful for NER, supporting experiments have thus far focused on newswire data. We emphasize the effectiveness of representations on Twitter NER, and demonstrate that their inclusion can improve performance by up to 20 F1.

机译：在Twitter上进行测试时，在新闻专线上训练的命名实体识别（NER）系统的性能非常差。在复制编辑的文本中可靠的信号几乎完全在Twitter的非正式聊天中消失，这需要构建专门的模型。使用众所周知的技术，当获得一小组带注释的培训推文时，我们着手改善Twitter NER的性能。为了利用未标记的推文，我们构建了布朗聚类和词向量，从而可以在分布相似的词上进行概括。为了利用带注释的新闻专线数据，我们采用了重要性加权方案。综上所述，我们在两个常见的测试装置上建立了新的技术水平。尽管众所周知，单词表示对NER很有用，但到目前为止，支持性实验都集中在新闻专线数据上。我们强调Twitter NER上的表示形式的有效性，并证明将其包括在内最多可以提高20个F1的性能。

著录项

来源
《Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 》|2015年|735-745|共11页
会议地点
作者
Colin Cherry; Hongyu Guo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A Multichannel Biomedical Named Entity Recognition Model Based on Multitask Learning and Contextualized Word Representations [J] . Hao Wei, Mingyuan Gao, Ai Zhou, Wireless communications & mobile computing . 2020 ,第1期

机译：基于多任务学习和上下文化字表示的多通道生物医学名为实体识别模型
2. Measuring the effect of different types of unsupervised word representations on Medical Named Entity Recognition [J] . Casillas Arantza, Ezeiza Nerea, Goenaga Takes, International journal of medical informatics . 2019 ,第Sepa期

机译：测量不同类型的无监督单词表示对医学命名实体识别的影响
3. Measuring the effect of different types of unsupervised word representations on Medical Named Entity Recognition [J] . Casillas Arantza, Ezeiza Nerea, Goenaga Takes, International journal of medical informatics . 2019 ,第SEPa期

机译：测量不同类型的无监督单词表示对医学命名实体识别的影响
4. The Unreasonable Effectiveness of Word Representations for Twitter Named Entity Recognition [C] . Colin Cherry, Hongyu Guo Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2015

机译：Twitter的词汇表格的不合理效果命名实体识别
5. Semi-supervised Named Entity Recognition: Learning to recognize 100 entity types with little supervision [D] . Nadeau, David. 2007

机译：半监督的命名实体识别：在很少的监督下学习识别100种实体类型
6. Combine Factual Medical Knowledge and Distributed Word Representation to Improve Clinical Named Entity Recognition [O] . Yonghui Wu, Xi Yang, Jiang Bian, 2018

机译：结合实际医学知识和分布式单词表示来改善临床命名实体识别
7. Hallym: Named Entity Recognition on Twitter with Word Representation [O] . Eun-Suk Yang, Yu-Seop Kim 2015

机译：Hallym：在Twitter上命名为单词表示的实体识别

The Unreasonable Effectiveness of Word Representations for Twitter Named Entity Recognition

摘要

著录项

相似文献

相关主题

期刊订阅