首页> 外文期刊>Journal of computational science >A new feature selection method to improve the document clustering using particle swarm optimization algorithm
【24h】

A new feature selection method to improve the document clustering using particle swarm optimization algorithm

机译:利用粒子群优化算法改进文档聚类的新特征选择方法

获取原文
获取原文并翻译 | 示例

摘要

The large amount of text information on the Internet and in modern applications makes dealing with this volume of information complicated. The text clustering technique is an appropriate tool to deal with an enormous amount of text documents by grouping these documents into coherent groups. The document size decreases the effectiveness of the text clustering technique. Subsequently, text documents contain sparse and uninformative features (i.e., noisy, irrelevant, and unnecessary features), which affect the effectiveness of the text clustering technique. The feature selection technique is a primary unsupervised learning method employed to select the informative text features to create a new subset of a document's features. This method is used to increase the effectiveness of the underlying clustering algorithm. Recently, several complex optimization problems have been successfully solved using meta heuristic algorithms. This paper proposes a novel feature selection method, namely, feature selection method using the particle swarm optimization (PSO) algorithm (FSPSOTC) to solve the feature selection problem by creating a new subset of informative text features. This new subset of features can improve the performance of the text clustering technique and reduce the computational time. Experiments were conducted using six standard text datasets with several characteristics. These datasets are commonly used in the domain of the text clustering. The results revealed that the proposed method (FSPSOTC) enhanced the effectiveness of the text clustering technique by dealing with a new subset of informative features. The proposed method is compared with the other well-known algorithms i.e., feature selection method using a genetic algorithm to improve the text clustering (FSGATC), and feature selection method using the harmony search algorithm to improve the text clustering (FSHSTC) in the text feature selection. (C) 2017 Elsevier B.V. All rights reserved.
机译:Internet和现代应用程序中的大量文本信息使处理这种信息量变得复杂。通过将文本文档分组为连贯的组,文本聚类技术是处理大量文本文档的合适工具。文档大小降低了文本聚类技术的有效性。随后,文本文档包含稀疏和无信息的特征(即嘈杂,无关和不必要的特征),这会影响文本聚类技术的有效性。特征选择技术是一种主要的无监督学习方法,用于选择信息性文本特征以创建文档特征的新子集。此方法用于提高基础聚类算法的有效性。最近,使用元启发式算法已成功解决了一些复杂的优化问题。提出了一种新颖的特征选择方法,即使用粒子群优化算法(FSPSOTC)的特征选择方法,通过创建新的信息文本特征子集来解决特征选择问题。这种新的功能子集可以提高文本聚类技术的性能,并减少计算时间。使用六个具有多个特征的标准文本数据集进行了实验。这些数据集通常用于文本聚类领域。结果表明,所提出的方法(FSPSOTC)通过处理新的信息特征子集提高了文本聚类技术的有效性。将该方法与其他著名算法进行了比较,即使用遗传算法改进文本聚类(FSGATC)的特征选择方法和使用和声搜索算法改进文本聚类(FSHSTC)的特征选择方法功能选择。 (C)2017 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号