A novel approach to generate a large scale of supervised data for short text sentiment analysis

Xiao Sun; Jiajin He

首页> 外文期刊>Multimedia Tools and Applications >A novel approach to generate a large scale of supervised data for short text sentiment analysis

【24h】

A novel approach to generate a large scale of supervised data for short text sentiment analysis

机译：一种新的方法来为短文本情绪分析产生大规模的监督数据

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

As for the complexity of language structure, the semantic structure, and the relative scarcity of labeled data and context information, sentiment analysis has been regarded as a challenging task in Natural Language Processing especially in the field of short-text processing. Deep learning model need a large scale of training data to overcome data sparseness and the over-fitting problem, we propose multi-granularity text-oriented data augmentation technologies to generate large-scale artificial data for training model, which is compared with Generative adversarial network(GAN). In this paper, a novel hybrid neural network model architecture(LSCNN) was proposed with our data augmentation technology, which is can outperforms many single neural network models. The proposed data augmentation method enhances the generalization ability of the proposed model. Experiment results show that the proposed data augmentation method in combination with the neural networks model can achieve astonishing performance without any handcrafted features on sentiment analysis or short text classification. It was validated on a Chinese on-line comment dataset and Chinese news headline corpus, and outperforms many state-of-the-art models. Evidence shows that the proposed data argumentation technology can obtain more accurate distribution representation from data for deep learning, which improves the generalization characteristics of the extracted features. The combination of the data argumentation technology and LSCNN fusion model is well suited to short text sentiment analysis, especially on small scale corpus.

机译：至于语言结构的复杂性，语义结构和标记数据和上下文信息的相对稀缺性，情绪分析被认为是自然语言处理中的具有挑战性的任务，特别是在短文本处理领域。深度学习模型需要大规模的培训数据来克服数据稀疏和过度拟合的问题，我们提出了多粒度的面向文本的数据增强技术，为培训模型产生了大规模的人工数据，与生成的对抗网络进行比较（GaN）。本文提出了一种新颖的混合神经网络模型架构（LSCNN），利用我们的数据增强技术，这可以优于许多单一神经网络模型。所提出的数据增强方法提高了所提出的模型的泛化能力。实验结果表明，建议的数据增强方法与神经网络模型的组合可以实现惊人的性能，而无需任何手工业分析或短文本分类。它在中文在线评论数据集和中文新闻标题语料库上验证，并且优于许多最先进的模型。证据表明，所提出的数据论证技术可以从深度学习数据获得更准确的分布表示，这提高了提取特征的泛化特征。数据论证技术和LSCNN融合模型的组合非常适合短文本情绪分析，特别是在小规模语料库上。

著录项

来源
《Multimedia Tools and Applications》 |2020年第10期|5439-5459|共21页
作者
Xiao Sun; Jiajin He;
展开▼
作者单位

School of Computer and Information Hefei University of Technology No. 193 TunXi Road BaoHe District Hefei China;

School of Computer and Information Hefei University of Technology No. 193 TunXi Road BaoHe District Hefei China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Data-driven feature learning; Data augmentation; Short text sentiment analysis; Model architectural designs; Large-scale artificial data;

机译：数据驱动特征学习;数据增强;短文本情绪分析;模型建筑设计;大规模人工数据;

相似文献

外文文献
中文文献
专利

1. Using Word Embedding and Ensemble Learning for Highly Imbalanced Data Sentiment Analysis in Short Arabic Text [J] . Sadam Al-Azani, El-Sayed M. El-Alfy Procedia Computer Science . 2017,第1期

机译：使用词嵌入和集成学习在阿拉伯语短文本中高度不平衡的数据情感分析
2. SENTIMENT ANALYSIS OF TWITTER DATA FOR DEMONETIZATION IN INDIA ? A TEXT MINING APPROACH [J] . Kaustav Roy, Disha Kohli, Rakeshkumar Kathirvel Senthil Kumar, Issues in Information Systems . 2017,第4期

机译：印度退潮推特数据的敏感性分析？文本挖掘方法
3. A semi-supervised approach to sentiment analysis using revised sentiment strength based on SentiWordNet [J] . Khan Farhan Hassan, Qamar Usman, Bashir Saba Knowledge and information systems . 2017,第3期

机译：基于Sentiwordnet的修改情绪强度，半监督的情绪分析方法
4. Semi-supervised Sentiment Analysis for Chinese Stock Texts in Scarce Labeled Data Scenario and Price Prediction [C] . Ji Zhaoyan, Yan Hongfei, Ying Siping, China Conference on Information Retrieval . 2020

机译：稀缺标签数据情形和价格预测中的中文股票文本半监督情绪分析
5. Fuzzification of Supervised and Semi-Supervised Convolution Neural Networks for Identification of Neutral Text in Sentiment Analysis [D] . ?Najar, Rawan 2020

机译：监督和半监控卷积神经网络的鉴定，用于识别中立文本的情感分析
6. Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature [O] . Rong Xu, QuanQiu Wang 2015

机译：从大规模文本医学生物医学文献中大规模提取药物副作用关系时将知识驱动方法与有监督的机器学习方法进行比较
7. AN APPROACH TO CONSTRUCTION AND ANALYSIS OF A CORPUS OF SHORT RUSSIAN TEXTS INTENDED TO TRAIN A SENTIMENT CLASSIFIER [O] . Yuliya Rubtsova, Yury Zagorulko 2014

机译：旨在培养情绪分类器的短语文本语料库构建与分析的方法

A novel approach to generate a large scale of supervised data for short text sentiment analysis

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅