首页> 外文会议>IEEE International Conference on Big Data >Probabilistic Named Entity Recognition for nonstandard format entities using cooccurrence word embeddings
【24h】

Probabilistic Named Entity Recognition for nonstandard format entities using cooccurrence word embeddings

机译:使用Cooccurrence Word Embeddings的非标准格式实体的概率命名实体识别

获取原文

摘要

The use of short text has become widespread in social media like Twitter and Facebook. Typically, users on social media platforms adopt nonstandard format terms when posting. This introduces challenges for Information Retrieval (IR) and Natural Language Processing (NLP) and standard or classical methods tend not to perform well in this domain. In this paper, we have addressed one of the challenges in IR which is Named Entity Recognition (NER). We introduce a novel probabilistic approach which targets entities occurring in an informal (nonstandard) format within short text. The Probabilistic Named Entity Recognition (PNER) model identifies these entities using cooccurrence patterns. These patterns have been detected using the word cooccurrence embeddings of 278.6 million tweets. The results show an enhancement of 7% on two standard methods when used in combination with PNER. The testing dataset has been created using the standard methods in addition to street names and places taken from the Open Street Map (OSM) database.
机译:在Twitter和Facebook这样的社交媒体中,使用短文本已经普遍存在。通常,社交媒体平台上的用户在发布时采用非标准格式术语。这引入了信息检索(IR)和自然语言处理(NLP)和标准或经典方法的挑战往往不会在该域中表现良好。在本文中,我们已经解决了IR中的一个挑战,其中命名为实体识别(ner)。我们介绍了一种新的概率方法,该方法在短文本中以非正式(非标准)格式发生的实体。概率命名实体识别(PNER)模型使用Cooccurrence模式标识这些实体。使用27860万推文的单词Cooccurrence Embedings检测到这些模式。结果表明,与合立合作结合使用时,两种标准方法提高了7%。除了从Open Street地图(OSM)数据库中的街道名称和地点外,还使用标准方法创建了测试数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号