...
首页> 外文期刊>Industrial Engineering & Management Systems >A Feature Selection Method for Classifying Highly Similar Te10.14742/ajet.t Documents
【24h】

A Feature Selection Method for Classifying Highly Similar Te10.14742/ajet.t Documents

机译:用于对高度相似TE10.14742 / AJET.T文档进行分类的特征选择方法

获取原文
           

摘要

In the era of big data, the importance of data classification is increasing. However, when it comes to classifying te10.14742/ajet.t documents, several obstacles degrade classification performance. These include multi-class documents, high levels of similarity between classes, class size imbalance, high dimensional representation space, and a low frequency of unique and discriminative features. To overcome these obstacles and improve classification performance, this paper proposes a novel feature selection method that effectively utilizes both unique and overlapping features. In general, feature selection methods have ignored unique features that occur only one class because of low frequency while it provides better discriminative-power. On the contrary, overlapping features, which are found in several classes with high frequency, have been also less preferred because of low discriminative-power. The proposed feature selection method attempts to use these two types of features as complementary with aims to improve overall classification performance for highly similar te10.14742/ajet.t documents. E10.14742/ajet.tensive numerical analysis have been conducted for three benchmarking datasets with a support vector machine (SVM) classifier. The proposed method showed that not only the class with high similarity but also the general classification performance is superior to the conventional feature selection methods, such as the global feature set, local feature set, discriminative feature set, and information gain.
机译:在大数据的时代,数据分类的重要性正在增加。但是,在分类TE10.14742 / AJET.T文件时,几个障碍会降低分类性能。这些包括多级文档,类之间的高度相似性,类大小不平衡,高维度表示空间以及唯一和辨别特征的低频。为了克服这些障碍并改善分类性能,本文提出了一种新颖的特征选择方法,有效利用唯一和重叠的特征。通常,特征选择方法忽略了只有一个类的独特功能,因为它提供更好的辨别力量。相反,由于低鉴别的功率,在几个具有高频率的类别中发现的重叠特征也不太优选。所提出的特征选择方法尝试使用这两种类型的功能与互补性的互补性,以提高高度相似的TE10.14742 / AJET.T文件的整体分类性能。 E10.14742 / AJET.SteveRice数值分析已经为三个基准测试数据集进行了支持向量机(SVM)分类器。所提出的方法表明,不仅具有高相似性的类,而且还有一般分类性能优于传统的特征选择方法,例如全局特征集,本地特征集,鉴别特征集和信息增益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号