首页> 外文期刊>Computing and informatics >OPTIMAL FEATURE SUBSET SELECTION BASED ON COMBINING DOCUMENT FREQUENCY AND TERM FREQUENCY FOR TEXT CLASSIFICATION
【24h】

OPTIMAL FEATURE SUBSET SELECTION BASED ON COMBINING DOCUMENT FREQUENCY AND TERM FREQUENCY FOR TEXT CLASSIFICATION

机译:基于组合文本频率和文本分类术语频率的最佳特征子集选择

获取原文
获取原文并翻译 | 示例

摘要

Feature selection plays a vital role to reduce the high dimension of the feature space in the text document classification problem. The dimension reduction of feature space reduces the computation cost and improves the text classification system accuracy. Hence, the identification of a proper subset of the significant features of the text corpus is needed to classify the data in less computational time with higher accuracy. In this proposed research, a novel feature selection method which combines the document frequency and the term frequency (FS-DFTF) is used to measure the significance of a term. The optimal feature subset which is selected by our proposed work is evaluated using Naive Bayes and Support Vector Machine classifier with various popular benchmark text corpus datasets. The experimental outcome confirms that the proposed method has a better classification accuracy when compared with other feature selection techniques.
机译:功能选择扮演一个重要的作用,可以减少文本文档分类问题中的要素空间的高维度。 特征空间的尺寸减小降低了计算成本并提高了文本分类系统精度。 因此,需要识别文本语料库的显着特征的适当子集,以具有更高的准确度的计算时间中的数据。 在该提出的研究中,使用组合文档频率和术语频率(FS-DFTF)的新颖特征选择方法来测量术语的重要性。 使用我们所提出的工作选择的最佳特征子集使用Naive Bayes和支持传染媒介机器分类器具有各种流行的基准文本语料库数据集。 实验结果证实,与其他特征选择技术相比,所提出的方法具有更好的分类准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号