首页> 外文会议>Asia Information Retrieval Symposium(AIRS 2005); 20051013-15; Jeju Island(KR) >An Examination of Feature Selection Frameworks in Text Categorization
【24h】

An Examination of Feature Selection Frameworks in Text Categorization

机译:文本分类中特征选择框架的检验

获取原文
获取原文并翻译 | 示例

摘要

Feature selection, an important task in text categorization, is used for the purpose of dimensionality reduction. Feature selection basically can be performed locally and globally. For local selection, distinct feature sets are derived from different classes. The number of feature set is thus depended on the number of class. In contrary, only one universal feature set will be used in global feature selection. It is assumed that the feature set should preserve the characteristic of all classes. Furthermore, feature selection can also be carried out based on relevant feature set only (local dictionary) or both relevant and irrelevant feature set (universal dictionary). In this paper, we explored the different frameworks of feature selection to the task of text categorization on the Reuters(10) and Reuters(115) datasets (variants of Reuters-21578 corpus). We then investigate the efficiency of 7 different local or global feature selections corresponds the use of local and universal dictionary. Our experiments have shown that local feature selection with local dictionary yields optimal categorization results.
机译:特征选择是文本分类中的一项重要任务,用于降维。特征选择基本上可以在本地和全局执行。对于局部选择,不同的特征集来自不同的类。因此,特征集的数量取决于类别的数量。相反,在全局特征选择中将仅使用一个通用特征集。假定功能集应保留所有类的特征。此外,还可以仅基于相关特征集(本地字典)或相关和不相关特征集(通用字典)进行特征选择。在本文中,我们探索了针对Reuters(10)和Reuters(115)数据集(Reuters-21578语料库的变体)上的文本分类任务的特征选择的不同框架。然后,我们调查了7种不同的局部或全局特征选择的效率,这些选择对应于局部和通用字典的使用。我们的实验表明,使用局部字典进行局部特征选择可以产生最佳的分类结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号