首页> 外文会议>IEEE International Conference on Big Data >Unsupervised domain-agnostic identification of product names in social media posts
【24h】

Unsupervised domain-agnostic identification of product names in social media posts

机译:社交媒体帖子中产品名称的不受监督的与领域无关的标识

获取原文

摘要

Product name recognition is a significant practical problem, spurred by the greater availability of platforms for discussing products such as social media and product review functionalities of online marketplaces. Customers, product manufacturers and online marketplaces may want to identify product names in unstructured text to extract important insights, such as sentiment, surrounding a product. Much extant research on product name identification has been domain-specific (e.g., identifying mobile phone models) and used supervised or semi-supervised methods. With massive numbers of new products released to the market every year such methods may require retraining on updated labeled data to stay relevant, and may transfer poorly across domains. This research addresses this challenge and develops a domain-agnostic, unsupervised algorithm for identifying product names based on Facebook posts. The algorithm consists of two general steps: (a) candidate product name identification using an off-the-shelf pretrained conditional random fields (CRF) model, part-of-speech tagging and a set of simple patterns; and (b) filtering of candidate names to remove spurious entries using clustering and word embeddings generated from the data.
机译:产品名称识别是一个重大的实际问题,这是因为讨论产品(例如社交媒体和在线市场的产品评论功能)的平台的可用性越来越高。客户,产品制造商和在线市场可能希望在非结构化文本中标识产品名称,以提取重要的见解,例如围绕产品的情感。关于产品名称识别的许多现有研究是针对特定领域的(例如,识别手机型号),并使用了监督或半监督方法。每年都有大量新产品投放市场,这种方法可能需要对更新的带标签数据进行再培训才能保持相关性,并且跨域的转移可能很差。这项研究解决了这一挑战,并开发了一种与领域无关的,无监督的算法,用于根据Facebook帖子识别产品名称。该算法包括两个一般步骤:(a)使用现成的预训练条件随机字段(CRF)模型,词性标记和一组简单模式来识别候选产品名称; (b)使用从数据中生成的聚类和词嵌入,过滤候选名称以删除虚假条目。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号