Fuzzy clustering-based semi-supervised approach for outlier detection in big text data

Farek Lazhar

首页> 外文期刊>Progress in Artificial Intelligence >Fuzzy clustering-based semi-supervised approach for outlier detection in big text data

【24h】

Fuzzy clustering-based semi-supervised approach for outlier detection in big text data

机译：基于模糊聚类的大文本数据远离异常检测的半导体方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text data is often polluted by outlier documents which can significantly influence the performance of classification techniques. In this paper, we propose an approach based on fuzzy clustering to detect outlier documents. The principle of our approach is based on the assumption that documents assigned to different clusters with very close degrees are considered as candidate outliers. Firstly, a semantic data model is built using Doc2Vec framework. Secondly, a fuzzy clustering is performed. Thirdly, candidate outlier documents are detected based on the different degrees of membership. Finally, for each candidate outlier, the objective function is recomputed, and a candidate document is considered as outlier when it conducts to considerably increase the objective function score. To show the effectiveness of our approach, two classification tests, one with original datasets and the second without outlier, are applied. Experimental results show that discarding outlier from datasets conducts to improve the performance of classifiers.

机译：文本数据通常由异常文档污染，可以显着影响分类技术的性能。在本文中，我们提出了一种基于模糊聚类来检测异常文档的方法。我们方法的原则是基于假设分配给具有非常接近度的不同群集的文件被视为候选异常值。首先，使用DOC2VEC框架构建语义数据模型。其次，执行模糊聚类。第三，基于不同的成员程度检测候选人异常文档。最后，对于每个候选人的异常值，目标函数已重新计算，并且当它对目标函数分数相当增加时，候选文档被视为异常值。为了展示我们的方法的有效性，应用了两个分类测试，一个具有原始数据集的分类测试和第二个没有异常值。实验结果表明，从数据集中丢弃了异常，以提高分类器的性能。

著录项

来源
《Progress in Artificial Intelligence》 |2019年第1期|共10页
作者
Farek Lazhar;
展开▼
作者单位

University of Guelma BP 411 Guelma Algeria;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Outlier detection; Fuzzy clustering; Big text data; Doc2Vec modeling; Sparsity; High dimensionality; Classification;

机译：异常检测;模糊聚类;大文本数据;DOC2VEC建模;稀疏性;高维度;分类;

相似文献

外文文献
中文文献
专利

1. Fuzzy clustering-based semi-supervised approach for outlier detection in big text data [J] . Farek Lazhar Progress in Artificial Intelligence . 2019,第1期

机译：基于模糊聚类的大文本数据远离异常检测的半导体方法
2. Fuzzy c-Means Clustering-Based Novel Threshold Criteria for Outlier Detection in Electronic Nose [J] . Prabha Verma, Mousumi Sinha, Siddhartha Panda Sensors Journal, IEEE . 2021,第2期

机译：基于模糊的C-Means集群基于电子鼻子的异常检测的新型阈值标准
3. An approach for outlier and novelty detection for text data based on classifier confidence [J] . Pižurica Nikola, Tomović Savo AI communications . 2020,第3a6期

机译：基于分类器置信度的文本数据的异常值和新颖性检测方法
4. Fuzzy Clustering-Based Approach for Outlier Detection [C] . MOHD BELAL AL-ZOUBI, AL-DAHOUD ALI, ABDELFATAH A. YAHYA Recent advances and applications of computer engineering . 2010

机译：基于模糊聚类的离群值检测方法
5. Scalable Detection and Extraction of Data in Lists in OCRed Text for Ontology Population Using Semi-Supervised and Unsupervised Active Wrapper Induction. [D] . Packer, Thomas L. 2014

机译：使用半监督和无监督主动包装诱导，可扩展地检测和提取OCRed文本中本体列表中的数据。
6. Designing a Streaming Algorithm for Outlier Detection in Data Mining—An Incremental Approach [O] . Kangqing Yu, Wei Shi, Nicola Santoro 2020

机译：设计用于数据挖掘中异常值检测的流算法—一种增量方法
7. A Semi-Supervised Approach to the Detection and Characterization of Outliers in Categorical Data [O] . Ienco, Dino, Pensa, Ruggero, Meo, Rosa 2016

机译：分类数据中异常值的检测和表征的半监督方法
8. Detection of Outliers and Robust Estimation Using Fuzzy Clustering [R] . Vancutsem, B., Gath, I. 1990

机译：基于模糊聚类的异常值检测与鲁棒估计

Fuzzy clustering-based semi-supervised approach for outlier detection in big text data

摘要

著录项

相似文献

相关主题

期刊订阅