首页> 外文期刊>PLoS One >Designing a hybrid dimension reduction for improving the performance of Amharic news document classification
【24h】

Designing a hybrid dimension reduction for improving the performance of Amharic news document classification

机译:设计混合尺寸减少以提高Amharic新闻文档分类的性能

获取原文
           

摘要

The volume of Amharic digital documents has grown rapidly in recent years. As a result, automatic document categorization is highly essential. In this paper, we present a novel dimension reduction approach for improving classification accuracy by combining feature selection and feature extraction. The new dimension reduction method utilizes Information Gain (IG), Chi-square test (CHI), and Document Frequency (DF) to select important features and Principal Component Analysis (PCA) to refine the features that have been selected. We evaluate the proposed dimension reduction method with a dataset containing 9 news categories. Our experimental results verified that the proposed dimension reduction method outperforms other methods. Classification accuracy with the new dimension reduction is 92.60%, which is 13.48%, 16.51% and 10.19% higher than with IG, CHI, and DF respectively. Further work is required since classification accuracy still decreases as we reduce the feature size to save computational time.
机译:近年来,Amharic数字文件的数量迅速增长。因此,自动文档分类非常重要。在本文中,我们通过组合特征选择和特征提取来提高分类准确性的新型尺寸减少方法。新的尺寸减少方法利用信息增益(IG),Chi-Square测试(CHI)和文档频率(DF)来选择重要的功能和主成分分析(PCA)来完善所选的功能。我们评估了包含9个新闻类别的数据集的尺寸减少方法。我们的实验结果证实,所提出的尺寸减少方法优于其他方法。新尺寸减少的分类准确性为92.60%,分别比IG,Chi和DF高出13.48%,16.51%和10.19%。由于我们减少特征大小以节省计算时间,因此需要进一步工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号