首页> 外文会议>Multi-Disciplinary International Workshop on Artificial Intelligence >Text Dimensionality Reduction for Document Clustering Using Hybrid Memetic Feature Selection
【24h】

Text Dimensionality Reduction for Document Clustering Using Hybrid Memetic Feature Selection

机译:使用混合膜特征选择的文档聚类文本减少文本减少

获取原文

摘要

In this paper, a document clustering method with a hybrid feature selection method is proposed. The proposed hybrid feature selection method integrates a Genetic-based wrapper method with ranking filter. The method is named Memetic Algorithm-Feature Selection (MA-FS). In this paper, MA-FS is combined with K-means and Spherical K-means (SK-means) clustering methods to perform document clustering. For the purpose of comparison, another unsupervised feature selection method, Feature Selection Genetic Text Clustering (FSGATC), is used. Two real-world criminal report document sets were used along with two popular benchmark datasets which are Reuters and 20newsgroup, were used in the comparisons. F-Micro, F-Macro and Average Distance of Document to Cluster (ADDC) measures were used for evaluation. The test results showed that the MA-FS method has outperformed the FSGATC method. It has also outperformed the results after using the entire feature space (ALL).
机译:本文提出了一种具有混合特征选择方法的文档聚类方法。所提出的混合特征选择方法与排序滤波器集成了基于遗传的包装方法。该方法名为Memetic算法 - 特征选择(MA-FS)。在本文中,MA-FS与K-Means和球面K-Meant(SK-Means)聚类方法组合,以执行文档聚类。出于比较的目的,使用另一个无监督的特征选择方法,特征选择基因文本聚类(FSGATC)。两个现实世界刑事报告文件集随附两个流行的基准数据集,该数据集是路透社和20个新的群组,用于比较。 F-Micro,F宏和文档的平均距离与群集(ADDC)测量用于评估。测试结果表明,MA-FS方法优于FSGATC方法。使用整个特征空间(全部)后,它也表现出结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号