首页> 外文会议>International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management >Novel Semantics-based Distributed Representations for Message Polarity Classification using Deep Convolutional Neural Networks
【24h】

Novel Semantics-based Distributed Representations for Message Polarity Classification using Deep Convolutional Neural Networks

机译:基于新型语义的分布式表示,用于使用深卷积神经网络的消息极性分类

获取原文

摘要

Unsupervised learning of distributed representations (word embeddings) obviates the need for task-specific feature engineering for various NLP applications. However, such representations learned from massive text datasets do not faithfully represent finer semantic information in the feature space required by specific applications. This is owing to the fact that (a) models learning such representations ignore the linguistic structure of the sentences, (b) they fail to capture polysemous usages of the words, and (c) they ignore pre-existing semantic information from manually-created ontologies. In this paper, we propose three semantics-based distributed representations of words and phrases as features for message polarity classification: Sentiment-Specific Multi-Word Expressions Embeddings (SSMWE) are sentiment encoded distributed representations of multi-word expressions (MWEs); Sense-Disambiguated Word Embeddings (SDWE) are sense-specific distributed representations of words; and WordNet embeddings (WNE) are distributed representations of hypernym and hyponym of the correct sense of a given word. We examine the effects of these features incorporated in a convolutional neural network (CNN) model for evaluation on the SemEval benchmarked dataset. Our approach of using these novel features yields 14.24% improvement in the macro-averaged F1 score on SemEval datasets over existing methods. While we have shown promising results in twitter sentiment classification, we believe that the method is general enough to be applied to many NLP applications where finer semantic analysis is required.
机译:未经监督的分布式表示学习(Word Embeddings)避免了对各种NLP应用程序的任务特定功能工程的需求。但是,从大规模文本数据集中学习的这种表示不会忠实地在特定应用程序所需的特征空间中代表更精细的语义信息。这是因为(a)模型学习此类表示忽略了句子的语言结构,(b)他们未能捕获单词的多态使用,并且(c)他们忽略了手动创建的预先存在的语义信息本体。在本文中,我们提出了三种基于语义的分布式表示单词和短语作为消息极性分类的特征:情绪特定的多字表达式嵌入式(SSMWE)是多字表达式的情绪编码的分布式表示(MWE);感觉歧义的单词嵌入式(SDWE)是单词的特定特定分布式表示; Wordnet Embeddings(WNE)是具有过度的分布式的表示,对给定单词的正确意义的虚张值。我们研究了在卷积神经网络(CNN)模型中的这些特征的效果,用于评估Semeval基准数据集。我们使用这些新颖功能的方法在现有方法上,在Semeval DataSets上的宏观平均F1分数提高了14.24%。虽然我们在Twitter情感分类中显示了有希望的结果,但我们认为该方法足以应用于许多需要更精细的语义分析的许多NLP应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号