首页> 外文期刊>Multimedia Tools and Applications >A novel approach to generate a large scale of supervised data for short text sentiment analysis
【24h】

A novel approach to generate a large scale of supervised data for short text sentiment analysis

机译:一种新的方法来为短文本情绪分析产生大规模的监督数据

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

As for the complexity of language structure, the semantic structure, and the relative scarcity of labeled data and context information, sentiment analysis has been regarded as a challenging task in Natural Language Processing especially in the field of short-text processing. Deep learning model need a large scale of training data to overcome data sparseness and the over-fitting problem, we propose multi-granularity text-oriented data augmentation technologies to generate large-scale artificial data for training model, which is compared with Generative adversarial network(GAN). In this paper, a novel hybrid neural network model architecture(LSCNN) was proposed with our data augmentation technology, which is can outperforms many single neural network models. The proposed data augmentation method enhances the generalization ability of the proposed model. Experiment results show that the proposed data augmentation method in combination with the neural networks model can achieve astonishing performance without any handcrafted features on sentiment analysis or short text classification. It was validated on a Chinese on-line comment dataset and Chinese news headline corpus, and outperforms many state-of-the-art models. Evidence shows that the proposed data argumentation technology can obtain more accurate distribution representation from data for deep learning, which improves the generalization characteristics of the extracted features. The combination of the data argumentation technology and LSCNN fusion model is well suited to short text sentiment analysis, especially on small scale corpus.
机译:至于语言结构的复杂性,语义结构和标记数据和上下文信息的相对稀缺性,情绪分析被认为是自然语言处理中的具有挑战性的任务,特别是在短文本处理领域。深度学习模型需要大规模的培训数据来克服数据稀疏和过度拟合的问题,我们提出了多粒度的面向文本的数据增强技术,为培训模型产生了大规模的人工数据,与生成的对抗网络进行比较(GaN)。本文提出了一种新颖的混合神经网络模型架构(LSCNN),利用我们的数据增强技术,这可以优于许多单一神经网络模型。所提出的数据增强方法提高了所提出的模型的泛化能力。实验结果表明,建议的数据增强方法与神经网络模型的组合可以实现惊人的性能,而无需任何手工业分析或短文本分类。它在中文在线评论数据集和中文新闻标题语料库上验证,并且优于许多最先进的模型。证据表明,所提出的数据论证技术可以从深度学习数据获得更准确的分布表示,这提高了提取特征的泛化特征。数据论证技术和LSCNN融合模型的组合非常适合短文本情绪分析,特别是在小规模语料库上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号