首页> 外文会议>Second workshop on abusive language online 2018 >Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs

【24h】

Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs

机译：通过结合知识图的文本增强和文本生成，提高性别歧视推文上的文本分类性能

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Text classification models have been heavily utilized for a slew of interesting natural language processing problems. Like any other machine learning model, these classifiers are very dependent on the size and quality of the training dataset. Insufficient and unbalanced datasets will lead to poor performance. An interesting solution to poor datasets is to take advantage of the world knowledge in the form of knowledge graphs to improve our training data. In this paper, we use ConceptNet and Wikidata to improve sexist tweet classification by two methods (1) text augmentation and (2) text generation. In our text generation approach, we generate new tweets by replacing words using data acquired from ConceptNet relations in order to increase the size of our training set, this method is very helpful with frustratingly small datasets, preserves the label and increases diversity. In our text augmentation approach, the number of tweets remains the same but their words are augmented (concatenation) with words extracted from their ConceptNet relations and their description extracted from Wikidata. In our text augmentation approach, the number of tweets in each class remains the same but the range of each tweet increases. Our experiments show that our approach improves sexist tweet classification significantly in our entire machine learning models. Our approach can be readily applied to any other small dataset size like hate speech or abusive language and text classificatbn problem using any machine learning model.

机译：文本分类模型已被大量用于一系列有趣的自然语言处理问题。像任何其他机器学习模型一样，这些分类器非常依赖于训练数据集的大小和质量。数据集不足和不平衡将导致性能不佳。一个针对不良数据集的有趣解决方案是利用知识图形式的世界知识来改进我们的训练数据。在本文中，我们使用ConceptNet和Wikidata通过两种方法（1）文本扩充和（2）文本生成来改进性别歧视推文分类。在我们的文本生成方法中，我们使用从ConceptNet关系中获取的数据替换单词来生成新的推文，以增加训练集的大小。此方法对于令人沮丧的小型数据集非常有用，可以保留标签并增加多样性。在我们的文本扩充方法中，推文的数量保持不变，但是用从其ConceptNet关系中提取的单词和从Wikidata中提取的描述来扩充（串联）其单词。在我们的文本扩充方法中，每个类别中的推文数量保持不变，但每个推文的范围都会增加。我们的实验表明，我们的方法在整个机器学习模型中都显着改善了性别歧视推文分类。我们的方法可以很容易地应用于任何其他小的数据集，例如使用任何机器学习模型的仇恨言论或辱骂性语言和文本分类问题。

著录项

来源
《Second workshop on abusive language online 2018》|2018年|107-114|共8页
会议地点 Brussels(BE)
作者
Sima Sharifirad; Borna Jafarpour; Stan Matwin;
展开▼
作者单位

Department of computer science, Dalhousie University, Halifax, Canada;

Huawei Technology, Toronto, Canada;

Department of computer science, Dalhousie University, Halifax, Canada;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Knowledge-driven graph similarity for text classification [J] . Shanavas Niloofer, Wang Hui, Lin Zhiwei, International journal of machine learning and cybernetics . 2021,第4期

机译：文本分类的知识驱动的图形相似性
2. A knowledge graph-based content selection model for data-driven text generation [J] . Jun-Peng Gong, Juan Cao, Peng-Zhou Zhang International journal of reasoning-based intelligent systems . 2017,第3a4期

机译：基于知识图的内容选择模型，用于数据驱动的文本生成
3. Effects of screen type, Chinese typography, text/background color combination, speed, and jump length for VDT leading display on user's reading performance [J] . An-Hsiang Wang, Cheng-Hsun Chen International Journal of Industrial Ergonomics . 2003,第4期

机译：屏幕类型，中文字体，文本/背景颜色组合，速度和跳变长度对VDT领先显示效果的影响
4. Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs [C] . Sima Sharifirad, Borna Jafarpour, Stan Matwin Conference on empirical methods in natural language processing . 2018

机译：通过使用知识图形的组合，通过文本增强和文本生成提升文本分类性能。
5. Analysing the effects of data augmentation and free parameters for text classification with recurrent convolutional neural networks. [D] . Quijas, Jonathan K. 2017

机译：使用递归卷积神经网络分析数据扩充和自由参数对文本分类的影响。
6. Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection [O] . Taxiarchis Botsis, Michael D Nguyen, Emily Jane Woo, 2011

机译：疫苗不良事件报告系统的文本挖掘：使用信息特征选择进行医学文本分类
7. Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs [O] . Sima Sharifirad, Borna Jafarpour, Stan Matwin 2018

机译：通过使用知识图形的组合，通过文本增强和文本生成提升文本分类性能。

Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅