首页> 外文会议>Pacific-Asia conference on knowledge discovery and data mining >Feature Selection Strategy in Text Classification
【24h】

Feature Selection Strategy in Text Classification

机译:文本分类中的特征选择策略

获取原文

摘要

Traditionally, the best number of features is determined by the so-called "rule of thumb", or by using a separate validation dataset. We can neither find any explanation why these lead to the best number nor do we have any formal feature selection model to obtain this number. In this paper, we conduct an in-depth empirical analysis and argue that simply selecting the features with the highest scores may not be the best strategy. A highest scores approach will turn many documents into zero length, so that they cannot contribute to the training process. Accordingly, we formulate the feature selection process as a dual objective optimization problem, and identify the best number of features for each document automatically. Extensive experiments are conducted to verify our claims. The encouraging results indicate our proposed framework is effective.
机译:传统上,最佳特征由所谓的“拇指规则”确定,或者使用单独的验证数据集确定。我们既不能找到任何解释,为什么这些导致最佳号码也没有任何形式的特征选择模型来获取此数字。在本文中,我们进行了深入的经验分析,并争辩说,只需选择具有最高分的特征可能不是最佳策略。最高分的方法将使许多文档变为零长度,以便他们无法为培训过程做出贡献。因此,我们将特征选择过程制定为双目标优化问题,并自动识别每个文档的最佳功能数量。进行广泛的实验以核实我们的索赔。令人鼓舞的结果表明我们拟议的框架是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号