【24h】

Text Categorization Based on Fuzzy Soft Set Theory

机译:基于模糊软集理论的文本分类

获取原文

摘要

In this paper, we proposed a new method for Text Categorization based on fuzzy soft set theory so called fuzzy soft set classifier (FSSC). We use fuzzy soft set representation that derived from the bag-of-words representation and define each term as a distinct word in the set of words of the document collection. The FSSC categorize each document by using fuzzy c-means formula for classification, and use fuzzy soft set similarity to measure distance between two documents. We perform the experiments with the standard Reuters-21578 dataset, and using three kind of weigthing such as boolean, term frequency, and term frequency-invert document frequency to compare the performance of FSSC with others four classifier such as kNN, Bayesian, Rocchio, and SVM. We are using precision, recall, F-measure, retun-size, and the running time as a performance evaluation. Result shown that there is no absolute winner. The FSSC has precision, recall, and F-measure lower than SVM, and kNN but FSSC can work faster than both. When compared with the Bayesian and Rocchio, the FSSC works more slowly but has a higher precision and F-measure.
机译:本文提出了一种基于模糊软集理论的文本分类新方法,即模糊软集分类器(FSSC)。我们使用从词袋表示中得出的模糊软集合表示,并将每个术语定义为文档集合中单词集中的一个不同单词。 FSSC使用模糊c均值公式对每个文档进行分类,并使用模糊软集相似度来度量两个文档之间的距离。我们使用标准的Reuters-21578数据集进行了实验,并使用布尔值,词频和词频倒置三种频率进行加权,以将FSSC的性能与其他四个分类器(如kNN,贝叶斯,罗基奥,和SVM。我们将精度,召回率,F量度,调整大小和运行时间用作性能评估。结果显示,没有绝对赢家。 FSSC的精度,召回率和F量度均低于SVM和kNN,但FSSC的工作速度比两者都快。与贝叶斯和Rocchio相比,FSSC的工作速度较慢,但​​精度和F测度更高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号