首页> 外文期刊>Expert systems with applications >Text data augmentations: Permutation, antonyms and negation
【24h】

Text data augmentations: Permutation, antonyms and negation

机译:文本数据增强:排列,反义词和否定

获取原文
获取原文并翻译 | 示例

摘要

Text has traditionally been used to train automated classifiers for a multitude of purposes, such as: classification, topic modelling and sentiment analysis. State-of-the-art LSTM classifier require a large number of training examples to avoid biases and successfully generalise. Labelled data greatly improves classification results, but not all modern datasets include large numbers of labelled examples. Labelling is a complex task that can be expensive, time-consuming, and potentially introduces biases. Data augmentation methods create synthetic data based on existing labelled examples, with the goal of improving classification results. These methods have been successfully used in image classification tasks and recent research has extended them to text classification. We propose a method that uses sentence permutations to augment an initial dataset, while retaining key statistical properties of the dataset. We evaluate our method with eight different datasets and a baseline Deep Learning process. This permutation method significantly improves classification accuracy by an average of 4.1%. We also propose two more text augmentations that reverse the classification of each augmented example, antonym and negation. We test these two augmentations in three eligible datasets, and the results suggest an -averaged, across all datasets-improvement in classification accuracy of 0.35% for antonym and 0.4% for negation, when compared to our proposed permutation augmentation.
机译:传统上,文本已被用于培训多种目的的自动分类器,例如:分类,主题建模和情感分析。最先进的LSTM分类器需要大量培训示例以避免偏见并成功概括。标记数据大大提高了分类结果,但并非所有现代数据集都包含大量标记的示例。标签是一种复杂的任务,可以昂贵,耗时,并且可能引入偏差。数据增强方法基于现有标记示例创建合成数据,其目标是提高分类结果。这些方法已成功用于图像分类任务,最近的研究将它们扩展到文本分类。我们提出了一种方法,该方法使用句子排列来增加初始数据集,同时保留数据集的密钥统计属性。我们评估了八个不同的数据集和基线深度学习过程的方法。这种置换方法显着提高了分类精度,平均值为4.1%。我们还提出了两个更多的文本增强,扭转了每个增强示例的分类,反义词和否定。我们在三个符合条件的数据集中测试这两个增强,结果表明,与我们提出的置换增强相比,所有数据集跨越所有数据集的分类准确性和0.4%的分类准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号