首页> 外文会议>International Conference on Management Science and Management Innovation >An Algorithm of Feature Selection in Text Categorization Based on Gini-index
【24h】

An Algorithm of Feature Selection in Text Categorization Based on Gini-index

机译:基于Gini-index的文本分类中的特征选择算法

获取原文
获取外文期刊封面目录资料

摘要

TWith the rapid development of World Wide Web, text categorization has played an important role in organizing and processing large amount of text data. The first and major problem of text categorization is how to select the best subset from the original high feature space in order to reduce the high dimensionality of the original feature space and improve the classification performance. Gini-Index is the principle of multi-attribute selection very early used for attribute selection in Decision Tree, which performs near state-of-the-art level. However, relatively little work has been done on applying Gini-Index to text feature selection. We use improved Gini-index for text feature selection, constructing the measure function based on Gini-Index. We compare it to other four feature selection measures using two kinds of classifiers on two different document corpus. The result of experiments shows that its performance is comparable with other text feature selection approaches. However, it is perfect in the time complexity of algorithm.
机译:Twith全球网络的快速发展,文本分类在组织和处理大量文本数据方面发挥了重要作用。文本分类的第一个和主要问题是如何从原始高特征空间中选择最佳子集,以减少原始特征空间的高维度并提高分类性能。 Gini-Index是多属性选择的原则,对于决策树中的属性选择,这是近最先进的水平。但是,在将gini-index应用于文本特征选择时,已经完成了相对较少的工作。我们使用改进的Gini-Index进行文本特征选择,构建基于Gini-Index的测量功能。我们将其与另外四个不同文档语料库上的分类器进行比较其他四个特征选择措施。实验结果表明,其性能与其他文本特征选择方法相当。但是,在算法的时间复杂性中是完美的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号