首页> 外文期刊>Concurrency, practice and experience >Optimizing unbalanced text classification tasks by integrating critical data mining and restricted rewriting techniques
【24h】

Optimizing unbalanced text classification tasks by integrating critical data mining and restricted rewriting techniques

机译:通过集成关键数据挖掘和限制重写技术优化不平衡文本分类任务

获取原文
获取原文并翻译 | 示例

摘要

Oversampling technology has been widely used to improve the classification task of unbalanced data. However, unlike structured data, the basic unit of text is words or characters, which can cause oversampling instances in digital space to lose word similarity in semantic space. To solve this problem, use text rewriting to directly generate artificial samples. Unfortunately, existing rewriting techniques usually destroy the grammatical structure and logic of the original text. In this article, we improve and limit some existing text rewriting methods, and propose an effective algorithm to mine feature words in various texts to help complete text rewriting. At the same time, by calculating the similarity between texts, various types of data are divided into key data and non-key data, and finally different rewriting processes are designed for them. The experimental results of four unbalanced text classification tasks show that our method is superior to the previous text rewriting method, which can improve the classification accuracy of the model by 1.7% to 2.9%, and the AUC can be increased by 0.012 to 0.058. The ablation experiment also explored the effects of various variables and methods on the experimental results.
机译:过采样技术已被广泛用于改善不平衡数据的分类任务。但是,与结构化数据不同,文本的基本单位是单词或字符,它可能导致数字空间中的过采样实例丢失语义空间中的单词相似性。要解决此问题,请使用文本重写直接生成人工样本。不幸的是,现有的重写技术通常会破坏原始文本的语法结构和逻辑。在本文中,我们改进并限制了一些现有文本重写方法,并提出了一种有效的算法来挖掘各种文本中的特征单词,以帮助完成文本重写。同时,通过计算文本之间的相似性,各种类型的数据被分成关键数据和非关键数据,并且最终为它们设计了不同的重写过程。四个不平衡文本分类任务的实验结果表明,我们的方法优于先前的文本重写方法,可以将模型的分类精度提高1.7%至2.9%,并且AUC可以增加0.012至0.058。消融实验还探讨了各种变量和方法对实验结果的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号