An enhanced ACO algorithm to select features for text categorization and its parallelization

M. Janaki Meena; K.R. Chandran; A. Karthik; A. Vijay Samuel

首页> 外文期刊>Expert Systems with Application >An enhanced ACO algorithm to select features for text categorization and its parallelization

【24h】

An enhanced ACO algorithm to select features for text categorization and its parallelization

机译：一种增强的ACO算法，用于选择特征进行文本分类及其并行化

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Feature selection is an indispensable preprocessing step for effective analysis of high dimensional data. It removes irrelevant features, improves the predictive accuracy and increases the comprehensibility of the model constructed by the classifiers sensitive to features. Finding an optimal feature subset for a problem in an outsized domain becomes intractable and many such feature selection problems have been shown to be NP-hard. Optimization algorithms are frequently designed for NP-hard problems to find nearly optimal solutions with a practical time complexity. This paper formulates the text feature selection problem as a combinatorial problem and proposes an Ant Colony Optimization (ACO) algorithm to find the nearly optimal solution for the same. It differs from the earlier algorithm by Aghdam et al. by including a heuristic function based on statistics and a local search. The algorithm aims at determining a solution that includes 'n' distinct features for each category. Optimization algorithms based on wrapper models show better results but the processes involved in them are time intensive. The availability of parallel architectures as a cluster of machines connected through fast Ethernet has increased the interest on parallelization of algorithms. The proposed ACO algorithm was parallelized and demonstrated with a cluster formed with a maximum of six machines. Documents from 20 newsgroup benchmark dataset were used for experimentation. Features selected by the proposed algorithm were evaluated using Naive bayes classifier and compared with the standard feature selection techniques. It was observed that the performance of the classifier had been improved with the features selected by the enhanced ACO and local search. Error of the classifier decreases over iterations and it was observed that the number of positive features increases with the number of iterations.

机译：特征选择是有效分析高维数据必不可少的预处理步骤。它消除了不相关的特征，提高了预测准确性，并提高了由对特征敏感的分类器构建的模型的可理解性。为大型域中的问题寻找最佳特征子集变得棘手，许多此类特征选择问题已显示为NP难题。通常针对NP难题设计优化算法，以找到具有实际时间复杂性的近乎最优的解决方案。本文将文本特征选择问题表述为一个组合问题，并提出了一种蚁群优化（ACO）算法来寻找该问题的最佳解。它与Aghdam等人的早期算法不同。通过包含基于统计信息和本地搜索的启发式功能。该算法旨在确定一种针对每个类别包含“ n”个不同特征的解决方案。基于包装器模型的优化算法显示出更好的结果，但是其中涉及的过程非常耗时。作为通过快速以太网连接的机器集群的并行体系结构的可用性，增加了对算法并行化的兴趣。提出的ACO算法经过并行处理，并通过最多由六台机器组成的集群进行了演示。来自20个新闻组基准数据集的文档用于实验。使用朴素贝叶斯分类器评估了由提出的算法选择的特征，并与标准特征选择技术进行了比较。观察到，通过增强的ACO和本地搜索选择的功能，提高了分类器的性能。分类器的误差随着迭代次数的增加而减小，并且观察到正特征的数量随着迭代次数的增加而增加。

著录项

来源
《Expert Systems with Application》 |2012年第5期|p.5861-5871|共11页
作者
M. Janaki Meena; K.R. Chandran; A. Karthik; A. Vijay Samuel;
展开▼
作者单位

Department of CSE, PSC College of Technology, Coimbatore, Tamil Nadu 641004, India;

Department of IT, PSC College of Technology, Coimbatore, Tamil Nadu 641004, India;

Department of CSE, PSC College of Technology, Coimbatore, Tamil Nadu 641004, India;

Department of CSE, PSC College of Technology, Coimbatore, Tamil Nadu 641004, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
bag of words; metaheuristic algorithms; ant colony optimization; heuristic information; local search; CHIR; X~2; parallel algorithm; mapreduce; distributed environment;

机译：一句话元启发式算法;蚁群优化;启发式信息;本地搜索;CHIR;X〜2;并行算法减少分布式环境;

相似文献

外文文献
中文文献
专利

1. Combination of modified BPNN algorithms and an efficient feature selection method for text categorization [J] . Cheng Hua Li, Soon Cheol Park Information Processing & Management . 2009,第3期

机译：结合改进的BPNN算法和有效的特征选择方法进行文本分类
2. TABLE-BASED MATCHING APPROACH USING GENETIC ALGORITHM FOR FEATURE SELECTION IN TEXT CATEGORIZATION [J] . B. SUNIL SRINIVAS, A. GOVARDHAN Journal of Theoretical and Applied Information Technology . 2016,第2期

机译：基于遗传算法的文本分类中基于表的匹配方法
3. A Heuristic Feature Selection Approach for Text Categorization by Using Chaos Optimization and Genetic Algorithm [J] . Hao Chen, Wen Jiang, Canbing Li, Mathematical Problems in Engineering . 2013,第pta15期

机译：混沌优化和遗传算法的文本分类启发式特征选择方法
4. An Efficient Feature Selection Algorithm Based on Hybrid Clonal Selection Genetic Strategy for Text Categorization [C] . Jiansheng Jiang, Wanneng Shu, Huixia Jin Advancing computing, communication, control and management . 2008

机译：基于混合克隆选择遗传策略的文本分类高效特征选择算法
5. Study of feature selection algorithms for text-categorization. [D] . Dave, Kandarp. 2011

机译：用于文本分类的特征选择算法的研究。
6. Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization [O] . Jieming Yang, Zhaoyang Qu, Zhiying Liu -1

机译：文本分类中考虑不平衡问题的改进特征选择方法
7. Study of feature selection algorithms for text-categorization [O] . Dave Kandarp 2011

机译：文本分类的特征选择算法研究
8. Parallel Implementation of an Artificial Neural Network Integrated Feature andArchitecture Selection Algorithm [R] . Rizzo, C. W. 1998

机译：人工神经网络集成特征与体系结构选择算法的并行实现

An enhanced ACO algorithm to select features for text categorization and its parallelization

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅