首页> 外文期刊>Journal of Harbin Institute of Technology >A fuzzy method to learn text classifier from labeled and unlabeled examples

A fuzzy method to learn text classifier from labeled and unlabeled examples


获取原文并翻译 | 示例


In text classification, labeling documents is a tedious and costly task, as it would consume a lot of expert time. On the other hand, it usually is easier to obtain a lot of unlabeled documents, with the help of some tools like Digital Library, Crawler Programs, and Searching Engine. To learn text classifier from labeled and unlabeled examples, a novel fuzzy method is proposed. Firstly, a Seeded Fuzzy c-means Clustering algorithm is proposed to learn fuzzy clusters from a set of labeled and unlabeled examples. Secondly, based on the resulting fuzzy clusters, some examples with high confidence are selected to construct training data set. Finally, the constructed training data set is used to train Fuzzy Support Vector Machine, and get text classifier. Empirical results on two benchmark datasets indicate that, by incorporating unlabeled examples into learning process, the method performs significantly better than FSVM trained with a small number of labeled examples only. Also, the method proposed performs at least as well as the related method-EM with Naive Bayes. One advantage of the method proposed is that it does not rely on any parametric assumptions about the data as it is usually the case with generative methods widely used in semi-supervised learning.



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号