首页> 外文期刊>Computational Social Systems, IEEE Transactions on >A Fuzzy Approach to Text Classification With Two-Stage Training for Ambiguous Instances
【24h】

A Fuzzy Approach to Text Classification With Two-Stage Training for Ambiguous Instances

机译:歧义实例两阶段训练的文本分类模糊方法

获取原文
获取原文并翻译 | 示例
       

摘要

Sentiment analysis is a very popular application area of text mining and machine learning. The popular methods include support vector machine, naive bayes, decision trees, and deep neural networks. However, these methods generally belong to discriminative learning, which aims to distinguish one class from others with a clear-cut outcome, under the presence of ground truth. In the context of text classification, instances are naturally fuzzy (can be multilabeled in some application areas) and thus are not considered clear-cut, especially given the fact that labels assigned to sentiment in text represent an agreed level of subjective opinion for multiple human annotators rather than indisputable ground truth. This has motivated researchers to develop fuzzy methods, which typically train classifiers through generative learning, i.e., a fuzzy classifier is used to measure the degree to which an instance belongs to each class. Traditional fuzzy methods typically involve generation of a single fuzzy classifier and employ a fixed rule of defuzzification outputting the class with the maximum membership degree. The use of a single fuzzy classifier with the above-fixed rule of defuzzification is likely to get the classifier encountering the text ambiguity situation on sentiment data, i.e., an instance may obtain equal membership degrees to both the positive and negative classes. In this paper, we focus on cyberhate classification, since the spread of hate speech via social media can have disruptive impacts on social cohesion and lead to regional and community tensions. Automatic detection of cyberhate has thus become a priority research area. In particular, we propose a modified fuzzy approach with two-stage training for dealing with text ambiguity and classifying four types of hate speech, namely, religion, race, disability, and sexual orientation-and compare its performance with those popular methods as well as some existing fuzzy approaches, while the features are prepared through the bag-of-words and word embedding feature extraction methods alongside the correlation-based feature subset selection method. The experimental results show that the proposed fuzzy method outperforms the other methods in most cases.
机译:情感分析是文本挖掘和机器学习的一个非常流行的应用领域。流行的方法包括支持向量机,朴素贝叶斯,决策树和深度神经网络。但是,这些方法通常属于判别式学习,其目的是在存在地面真理的情况下,以清晰的结果将一个班级与另一个班级区分开。在文本分类的上下文中,实例自然是模糊的(可以在某些应用程序区域中被多标签标记),因此不能被认为是明确的,特别是考虑到以下事实:在文本中分配给情感的标签代表了多个人的共识主观意见注释者,而不是无可争辩的地面真理。这促使研究人员开发模糊方法,该方法通常通过生成学习来训练分类器,即,使用模糊分类器来衡量实例属于每个类的程度。传统的模糊方法通常涉及单个模糊分类器的生成,并采用去模糊化的固定规则来输出具有最大隶属度的类。将单个模糊分类器与上述固定的解模糊规则结合使用,很可能使分类器遇到情感数据上的文本歧义情况,即,一个实例可以获得与正类和负类相等的隶属度。在本文中,我们将重点放在网络仇恨分类上,因为通过社交媒体传播仇恨言论可能会对社会凝聚力产生破坏性影响,并导致地区和社区紧张。因此,自动检测网络仇恨已成为优先研究领域。特别是,我们提出了一种经过改进的模糊方法,该方法经过了两阶段的培训,可以处理文本歧义并将宗教,种族,残障和性取向这四种仇恨言论分类,并将其表现与那些流行的方法以及一些现有的模糊方法,同时通过词袋和词嵌入特征提取方法以及基于相关的特征子集选择方法来准备特征。实验结果表明,所提出的模糊方法在大多数情况下都优于其他方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号