首页> 外文会议>Multi-disciplinary international workshop on artificial intelligence >Text Dimensionality Reduction for Document Clustering Using Hybrid Memetic Feature Selection
【24h】

Text Dimensionality Reduction for Document Clustering Using Hybrid Memetic Feature Selection

机译:使用混合模因特征选择的文档聚类的文本降维

获取原文

摘要

In this paper, a document clustering method with a hybrid feature selection method is proposed. The proposed hybrid feature selection method integrates a Genetic-based wrapper method with ranking filter. The method is named Memetic Algorithm-Feature Selection (MA-FS). In this paper, MA-FS is combined with K-means and Spherical K-means (SK-means) clustering methods to perform document clustering. For the purpose of comparison, another unsupervised feature selection method, Feature Selection Genetic Text Clustering (FSGATC), is used. Two real-world criminal report document sets were used along with two popular benchmark datasets which are Reuters and 20newsgroup, were used in the comparisons. F-Micro, F-Macro and Average Distance of Document to Cluster (ADDC) measures were used for evaluation. The test results showed that the MA-FS method has outperformed the FSGATC method. It has also outperformed the results after using the entire feature space (ALL).
机译:本文提出了一种具有混合特征选择方法的文档聚类方法。提出的混合特征选择方法将基于遗传的包装方法与排名过滤器集成在一起。该方法称为Memetic算法功能选择(MA-FS)。在本文中,MA-FS与K-means和球形K-means(SK-means)聚类方法相结合来执行文档聚类。为了进行比较,使用了另一种无监督的特征选择方法,即特征选择遗传文本聚类(FSGATC)。在比较中,使用了两个真实世界的犯罪报告文档集以及两个流行的基准数据集,分别是Reuters和20newsgroup。使用F-Micro,F-Macro和“文档到群集的平均距离”(ADDC)度量进行评估。测试结果表明,MA-FS方法优于FSGATC方法。在使用了整个特征空间(ALL)之后,它的性能也超过了结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号