首页> 外文会议>Annual conference of the North American Chapter of the Association for Computational Linguistics: human language technologies >Modeling Noisiness to Recognize Named Entities using Multitask Neural Networks on Social Media
【24h】

Modeling Noisiness to Recognize Named Entities using Multitask Neural Networks on Social Media

机译:在社交媒体上使用多任务神经网络对噪声进行建模以识别命名实体

获取原文
获取外文期刊封面目录资料

摘要

Recognizing named entities in a document is a key task in many NLP applications. Although current state-of-the-art approaches to this task reach a high performance on clean text (e.g. newswire genres), those algorithms dramatically degrade when they are moved to noisy environments such as social media domains. We present two systems that address the challenges of processing social media data using character-level phonetics and phonology, word embeddings, and Part-of-Speech tags as features. The first model is a multitask end-to-end Bidirectional Long Short-Term Memory (BLSTM)-Conditional Random Field (CRF) network whose output layer contains two CRF classifiers. The second model uses a multitask BLSTM network as feature extractor that transfers the learning to a CRF classifier for the final prediction. Our systems outperform the current Fl scores of the state of the art on the Workshop on Noisy User-generated Text 2017 dataset by 2.45% and 3.69%, establishing a more suitable approach for social media environments.
机译:在许多NLP应用程序中,识别文档中的命名实体是一项关键任务。尽管当前用于此任务的最新方法在纯文本(例如新闻专栏类型)上具有很高的性能,但是当这些算法转移到嘈杂的环境(例如社交媒体域)时,它们的性能会大大降低。我们提供了两个系统,这些系统使用字符级语音和语音学,词嵌入和词性标记作为特征来应对处理社交媒体数据的挑战。第一个模型是多任务端到端双向长期短期记忆(BLSTM)-条件随机场(CRF)网络,其输出层包含两个CRF分类器。第二个模型使用多任务BLSTM网络作为特征提取器,将学习转移到CRF分类器中以进行最终预测。我们的系统在嘈杂的用户生成的Text 2017数据集研讨会上的当前Fl评分表现优于2.4%和3.69%,为社交媒体环境建立了更合适的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号