A Fuzzy Approach to Text Classification With Two-Stage Training for Ambiguous Instances

Han Liu; Pete Burnap; Wafa Alorainy; Matthew L. Williams

首页> 外文期刊>Computational Social Systems, IEEE Transactions on >A Fuzzy Approach to Text Classification With Two-Stage Training for Ambiguous Instances

【24h】

A Fuzzy Approach to Text Classification With Two-Stage Training for Ambiguous Instances

机译：歧义实例两阶段训练的文本分类模糊方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sentiment analysis is a very popular application area of text mining and machine learning. The popular methods include support vector machine, naive bayes, decision trees, and deep neural networks. However, these methods generally belong to discriminative learning, which aims to distinguish one class from others with a clear-cut outcome, under the presence of ground truth. In the context of text classification, instances are naturally fuzzy (can be multilabeled in some application areas) and thus are not considered clear-cut, especially given the fact that labels assigned to sentiment in text represent an agreed level of subjective opinion for multiple human annotators rather than indisputable ground truth. This has motivated researchers to develop fuzzy methods, which typically train classifiers through generative learning, i.e., a fuzzy classifier is used to measure the degree to which an instance belongs to each class. Traditional fuzzy methods typically involve generation of a single fuzzy classifier and employ a fixed rule of defuzzification outputting the class with the maximum membership degree. The use of a single fuzzy classifier with the above-fixed rule of defuzzification is likely to get the classifier encountering the text ambiguity situation on sentiment data, i.e., an instance may obtain equal membership degrees to both the positive and negative classes. In this paper, we focus on cyberhate classification, since the spread of hate speech via social media can have disruptive impacts on social cohesion and lead to regional and community tensions. Automatic detection of cyberhate has thus become a priority research area. In particular, we propose a modified fuzzy approach with two-stage training for dealing with text ambiguity and classifying four types of hate speech, namely, religion, race, disability, and sexual orientation-and compare its performance with those popular methods as well as some existing fuzzy approaches, while the features are prepared through the bag-of-words and word embedding feature extraction methods alongside the correlation-based feature subset selection method. The experimental results show that the proposed fuzzy method outperforms the other methods in most cases.

机译：情感分析是文本挖掘和机器学习的一个非常流行的应用领域。流行的方法包括支持向量机，朴素贝叶斯，决策树和深度神经网络。但是，这些方法通常属于判别式学习，其目的是在存在地面真理的情况下，以清晰的结果将一个班级与另一个班级区分开。在文本分类的上下文中，实例自然是模糊的（可以在某些应用程序区域中被多标签标记），因此不能被认为是明确的，特别是考虑到以下事实：在文本中分配给情感的标签代表了多个人的共识主观意见注释者，而不是无可争辩的地面真理。这促使研究人员开发模糊方法，该方法通常通过生成学习来训练分类器，即，使用模糊分类器来衡量实例属于每个类的程度。传统的模糊方法通常涉及单个模糊分类器的生成，并采用去模糊化的固定规则来输出具有最大隶属度的类。将单个模糊分类器与上述固定的解模糊规则结合使用，很可能使分类器遇到情感数据上的文本歧义情况，即，一个实例可以获得与正类和负类相等的隶属度。在本文中，我们将重点放在网络仇恨分类上，因为通过社交媒体传播仇恨言论可能会对社会凝聚力产生破坏性影响，并导致地区和社区紧张。因此，自动检测网络仇恨已成为优先研究领域。特别是，我们提出了一种经过改进的模糊方法，该方法经过了两阶段的培训，可以处理文本歧义并将宗教，种族，残障和性取向这四种仇恨言论分类，并将其表现与那些流行的方法以及一些现有的模糊方法，同时通过词袋和词嵌入特征提取方法以及基于相关的特征子集选择方法来准备特征。实验结果表明，所提出的模糊方法在大多数情况下都优于其他方法。

著录项

来源
《Computational Social Systems, IEEE Transactions on》 |2019年第2期|227-240|共14页
作者
Han Liu; Pete Burnap; Wafa Alorainy; Matthew L. Williams;
展开▼
作者单位

School of Computer Science and Informatics Cardiff University Cardiff U.K.;

School of Social Scie;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Feature extraction; Sentiment analysis; Support vector machines; Training; Machine learning; Time-frequency analysis; Machine learning algorithms;

机译：特征提取;情绪分析;支持向量机;训练;机器学习;时频分析;机器学习算法;
入库时间 2022-08-18 04:33:53

相似文献

外文文献
中文文献
专利

1. Generating fuzzy rules from training instances for fuzzy classification systems [J] . Shyi-Ming Chen, Fu-Ming Tsai Expert systems with applications . 2008,第3期

机译：从训练实例中为模糊分类系统生成模糊规则
2. A New multi-instance multi-label learning approach for image and text classification [J] . Yan Kaobi, Li Zhixin, Zhang Canlong Multimedia Tools and Applications . 2016,第13期

机译：用于图像和文本分类的多实例多标签学习新方法
3. Towards enhancing centroid classifier for text classification-A border-instance approach [J] . Deqing Wang, Junjie Wu, Hui Zhang, Neurocomputing . 2013,第FEBa4期

机译：面向文本分类的增强质心分类器-边界实例方法
4. Clustering based two-stage text classification requiring minimal training data [C] . Zhang Xue, Xiao Wang-xin Systems and Informatics (ICSAI), 2012 International Conference on . 2012

机译：基于聚类的两阶段文本分类，需要最少的培训数据
5. Innovative two-stage fuzzy classification for unknown intrusion detection. [D] . Jing, Xueyan Sharon. 2016

机译：创新的两阶段模糊分类，用于未知入侵检测。
6. Improved Distance Functions for Instance-Based Text Classification [O] . Khalil El Hindi, Bayan Abu Shawar, Reem Aljulaidan, 2020

机译：基于实例的文本分类改进了距离函数
7. An Ontology-based Two-Stage Approach to Medical Text Classification with Feature Selection by Particle Swarm Optimisation [O] . Mahdi Abdollahi, Xiaoying Gao, Yi Mei, 2021

机译：通过粒子群优化具有特征选择的医学文本分类的基于本体的两阶段方法

A Fuzzy Approach to Text Classification With Two-Stage Training for Ambiguous Instances

摘要

著录项

相似文献

相关主题

期刊订阅