...
首页> 外文期刊>Progress in Artificial Intelligence >Fuzzy clustering-based semi-supervised approach for outlier detection in big text data
【24h】

Fuzzy clustering-based semi-supervised approach for outlier detection in big text data

机译:基于模糊聚类的大文本数据远离异常检测的半导体方法

获取原文
获取原文并翻译 | 示例
           

摘要

Text data is often polluted by outlier documents which can significantly influence the performance of classification techniques. In this paper, we propose an approach based on fuzzy clustering to detect outlier documents. The principle of our approach is based on the assumption that documents assigned to different clusters with very close degrees are considered as candidate outliers. Firstly, a semantic data model is built using Doc2Vec framework. Secondly, a fuzzy clustering is performed. Thirdly, candidate outlier documents are detected based on the different degrees of membership. Finally, for each candidate outlier, the objective function is recomputed, and a candidate document is considered as outlier when it conducts to considerably increase the objective function score. To show the effectiveness of our approach, two classification tests, one with original datasets and the second without outlier, are applied. Experimental results show that discarding outlier from datasets conducts to improve the performance of classifiers.
机译:文本数据通常由异常文档污染,可以显着影响分类技术的性能。在本文中,我们提出了一种基于模糊聚类来检测异常文档的方法。我们方法的原则是基于假设分配给具有非常接近度的不同群集的文件被视为候选异常值。首先,使用DOC2VEC框架构建语义数据模型。其次,执行模糊聚类。第三,基于不同的成员程度检测候选人异常文档。最后,对于每个候选人的异常值,目标函数已重新计算,并且当它对目标函数分数相当增加时,候选文档被视为异常值。为了展示我们的方法的有效性,应用了两个分类测试,一个具有原始数据集的分类测试和第二个没有异常值。实验结果表明,从数据集中丢弃了异常,以提高分类器的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号