首页> 外文OA文献 >Word embedding composition for data imbalances in sentiment and emotion classification

【2h】

Word embedding composition for data imbalances in sentiment and emotion classification

机译：情感和情感分类中数据不平衡的词嵌入组合

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Text classification often faces the problem of imbalanced training data. This is true in sentiment analysis and particularly prominent in emotion classification where multiple emotion categories are very likely to produce naturally skewed training data. Different sampling methods have been proposed to improve classification performance by reducing the imbalance ratio between training classes. However, data sparseness and the small disjunct problem remain obstacles in generating new samples for minority classes when the data are skewed and limited. Methods to produce meaningful samples for smaller classes rather than simple duplication are essential in overcoming this problem. In this paper, we present an oversampling method based on word embedding compositionality which produces meaningful balanced training data. We first use a large corpus to train a continuous skip-gram model to form a word embedding model maintaining the syntactic and semantic integrity of the word features. Then, a compositional algorithm based on recursive neural tensor networks is used to construct sentence vectors based on the word embedding model. Finally, we use the SMOTE algorithm as an oversampling method to generate samples for the minority classes and produce a fully balanced training set. Evaluation results on two quite different tasks show that the feature composition method and the oversampling method are both important in obtaining improved classification results. Our method effectively addresses the data imbalance issue and consequently achieves improved results for both sentiment and emotion classification.

机译：文本分类经常面临训练数据不平衡的问题。这在情感分析中是正确的，在情感分类中尤为突出，在情感分类中，多种情感类别很可能产生自然偏斜的训练数据。已经提出了不同的采样方法以通过减少训练课程之间的不平衡比来提高分类性能。但是，当数据偏斜和受到限制时，数据稀疏和小分离问题仍然是为少数群体生成新样本的障碍。为克服此问题，为较小的类而不是简单的重复生成有意义的样本的方法至关重要。在本文中，我们提出了一种基于词嵌入构图的过采样方法，该方法产生有意义的平衡训练数据。我们首先使用一个大型语料库来训练一个连续的跳过语法模型，以形成一个词嵌入模型，以维持词特征的句法和语义完整性。然后，采用基于递归神经张量网络的合成算法，基于词嵌入模型构造句子向量。最后，我们将SMOTE算法用作过采样方法，以生成少数群体的样本并生成完全平衡的训练集。在两个截然不同的任务上的评估结果表明，特征组合方法和过采样方法对于获得改进的分类结果都很重要。我们的方法有效地解决了数据不平衡的问题，因此在情感和情感分类方面均取得了改进的结果。

著录项

作者
Xu RF; Chen T; Xia YQ; Lu Q; Liu B; Wang X;
展开▼
作者单位

展开▼
年度 2015
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. Word Embedding Composition for Data Imbalances in Sentiment and Emotion Classification [J] . Ruifeng Xu, Tao Chen, Yunqing Xia, Cognitive computation . 2015,第2期

机译：情感和情感分类中数据不平衡的词嵌入组合
2. Using Word Embedding and Ensemble Learning for Highly Imbalanced Data Sentiment Analysis in Short Arabic Text [J] . Sadam Al-Azani, El-Sayed M. El-Alfy Procedia Computer Science . 2017,第1期

机译：使用词嵌入和集成学习在阿拉伯语短文本中高度不平衡的数据情感分析
3. Using a contextual entropy model to expand emotion words and their intensity for the sentiment classification of stock market news [J] . Liang-Chih Yu, Jheng-Long Wu, Pei-Chann Chang, Knowledge-Based Systems . 2013,第Mara期

机译：使用上下文熵模型扩展情感词及其强度，以进行股市新闻的情感分类
4. Unlock big data emotions: Weighted word embeddings for sentiment classification [C] . Xiangfeng Dai, Bob Prout IEEE International Congress on Big Data . 2016

机译：释放大数据情感：用于情感分类的加权单词嵌入
5. Sentiment Classification Using Different Machine Learning Algorithms and Different Word Vectorizer [D] . Cui, Shuyang. 2021

机译：使用不同机器学习算法和不同词矢量化的情绪分类
6. A Word Embedding Model for Mapping Food Composition Databases Using Fuzzy Logic [O] . Andrea Morales-Garzón, Juan Gómez-Romero, M. J. Martin-Bautista -1

机译：用模糊逻辑映射食物成分数据库的词嵌入模型
7. Sentiment-Aware Word Embedding for Emotion Classification [O] . Xingliang Mao, Shuai Chang, Jinjing Shi, 2019

机译：情绪意识的词嵌入情感分类

Word embedding composition for data imbalances in sentiment and emotion classification

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅