首页> 外文会议>2012 20th Iranian conference on electrical engineering >Two-stage text feature selection method using fuzzy entropy measure and an t colony optimization
【24h】

Two-stage text feature selection method using fuzzy entropy measure and an t colony optimization

机译:基于模糊熵测度和t菌落优化的两阶段文本特征选择方法

获取原文
获取原文并翻译 | 示例

摘要

Text categorization is widely used when organizing documents in a digital form. Due to the increasing number of documents in digital form, automated text categorization has been emerged as an appropriate tool to classify documents into predefined categories. High dimensionality of the feature space is a common problem in text categorization. Most of the features affecting the classifier performance are irrelevant and redundant. Hence, feature selection is used to reduce feature space thus increasing classifier performance. In this paper, a two-stage method is proposed for text feature selection. At the first stage a filtering technique using the fuzzy entropy measure is applied and features are ranked based on their values. Then, features with the values higher than a threshold are removed from feature set. In the second stage, an ant colony optimization approach is employed to select features from the reduced feature space in the first stage. The proposed method is evaluated through the use of the k-nearest neighbor classifier on top 10 Retures-21578 categories. The experimental results obtained, show the efficiency of the proposed method.
机译:在以数字形式组织文档时,广泛使用文本分类。由于数字形式的文档数量不断增加,自动文本分类已成为将文档分类为预定义类别的合适工具。特征空间的高维性是文本分类中的常见问题。影响分类器性能的大多数功能都是不相关和多余的。因此,特征选择用于减少特征空间,从而提高分类器性能。本文提出了一种两阶段的文本特征选择方法。在第一阶段,应用了一种使用模糊熵测度的滤波技术,并根据特征值对特征进行排名。然后,从特征集中删除值高于阈值的特征。在第二阶段,采用蚁群优化方法从第一阶段的缩小特征空间中选择特征。通过在前10个Retures-21578类别上使用k最近邻分类器对所提出的方法进行评估。实验结果表明,该方法是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号