首页> 外文会议>IEEE International Conference on Tools with Artificial Intelligence >Discovering Highly Informative Feature Set over High Dimensions
【24h】

Discovering Highly Informative Feature Set over High Dimensions

机译:在高维度上发现高度信息化的特征集

获取原文

摘要

For many textual collections, the number of features is often overly large. These features can be very redundant, it is therefore desirable to have a small, succinct, yet highly informative collection of features that describes the key characteristics of a dataset. Information theory is one such tool for us to obtain this feature collection. With this paper, we mainly contribute to the improvement of efficiency for the process of selecting the most informative feature set over high-dimensional unlabeled data. We propose a heuristic theory for informative feature set selection from high dimensional data. Moreover, we design data structures that enable us to compute the entropies of the candidate feature sets efficiently. We also develop a simple pruning strategy that eliminates the hopeless candidates at each forward selection step. We test our method through experiments on real-world data sets, showing that our proposal is very efficient.
机译:对于许多文本集合而言,功能的数量通常过大。这些特征可能是非常多余的,因此希望有一个小的,简洁而又内容丰富的特征集合,这些特征描述了数据集的关键特征。信息论就是我们获取此特征集合的一种这样的工具。在本文中,我们主要致力于提高针对高维未标记数据的信息量最大的特征集选择过程的效率。我们提出了一种启发式理论,用于从高维数据中选择信息丰富的特征集。此外,我们设计的数据结构使我们能够高效地计算候选特征集的熵。我们还开发了一种简单的修剪策略,可在每个正向选择步骤中消除绝望的候选人。我们通过对真实数据集进行实验来测试我们的方法,表明我们的建议非常有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号