首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Cross-Domain and Semisupervised Named Entity Recognition in Chinese Social Media: A Unified Model
【24h】

Cross-Domain and Semisupervised Named Entity Recognition in Chinese Social Media: A Unified Model

机译:中国社交媒体中跨域和半监督的命名实体识别:统一模型

获取原文
获取原文并翻译 | 示例
           

摘要

Named entity recognition (NER) in Chinese social media is an important, but challenging task because Chinese social media language is informal and noisy. Most previous methods on NER focus on in-domain supervised learning, which is limited by scarce annotated data in social media. In this paper, we present that sufficient corpora in formal domains and massive unannotated text can be combined to improve the NER performance in social media. We propose a unified model which can learn from out-of-domain corpora and in-domain unannotated text. The unified model is composed of two parts. One is for cross-domain learning and the other is for semisupervised learning. Cross-domain learning can learn out-of-domain information based on domain similarity. Semisupervised learning can learn in-domain unannotated information by self-training. Experimental results show that our unified model yields a 9.57% improvement over strong baselines and achieves the state-of-the-art performance.
机译:中文社交媒体中的命名实体识别(NER)是一项重要但具有挑战性的任务,因为中文社交媒体语言是非正式且嘈杂的。 NER上的大多数先前方法都集中于域内监督学习,这受到社交媒体中稀缺的带注释数据的限制。在本文中,我们提出可以将正式领域中足够的语料库和大量无注释的文本组合起来,以提高NER在社交媒体中的表现。我们提出一个可以从域外语料库和域内无注释文本中学习的统一模型。统一模型由两部分组成。一种用于跨域学习,另一种用于半监督学习。跨域学习可以基于域相似度来学习域外信息。半监督学习可以通过自我训练来学习域内未注释的信息。实验结果表明,我们的统一模型比强基准提高了9.57%,并达到了最先进的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号