Discovering Highly Informative Feature Set over High Dimensions

机译：在高维度上发现高度信息化的特征集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

For many textual collections, the number of features is often overly large. These features can be very redundant, it is therefore desirable to have a small, succinct, yet highly informative collection of features that describes the key characteristics of a dataset. Information theory is one such tool for us to obtain this feature collection. With this paper, we mainly contribute to the improvement of efficiency for the process of selecting the most informative feature set over high-dimensional unlabeled data. We propose a heuristic theory for informative feature set selection from high dimensional data. Moreover, we design data structures that enable us to compute the entropies of the candidate feature sets efficiently. We also develop a simple pruning strategy that eliminates the hopeless candidates at each forward selection step. We test our method through experiments on real-world data sets, showing that our proposal is very efficient.

机译：对于许多文本集合而言，功能的数量通常过大。这些特征可能是非常多余的，因此希望有一个小的，简洁而又内容丰富的特征集合，这些特征描述了数据集的关键特征。信息论就是我们获取此特征集合的一种这样的工具。在本文中，我们主要致力于提高针对高维未标记数据的信息量最大的特征集选择过程的效率。我们提出了一种启发式理论，用于从高维数据中选择信息丰富的特征集。此外，我们设计的数据结构使我们能够高效地计算候选特征集的熵。我们还开发了一种简单的修剪策略，可在每个正向选择步骤中消除绝望的候选人。我们通过对真实数据集进行实验来测试我们的方法，表明我们的建议非常有效。

著录项

来源
《IEEE International Conference on Tools with Artificial Intelligence》|2012年|p.1059-1064|共6页
会议地点
作者
Zhang Chongsheng; Masseglia Florent; Zhang Xiangliang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类人工智能理论;人工智能理论;
关键词
Feature Selection; Unsupervised; high dimensions;

机译：特征选择;无监督;高尺寸;

相似文献

外文文献
中文文献
专利

1. A computational pipeline to discover highly phylogenetically informative genes in sequenced genomes: application to Saccharomyces cerevisiae natural strains [J] . Ramazzotti Matteo, Berna Luisa, Stefanini Irene, Nucleic Acids Research . 2012,第9期

机译：在测序基因组中发现高度系统信息学基因的计算管道：在酿酒酵母天然菌株中的应用
2. A computational pipeline to discover highly phylogenetically informative genes in sequenced genomes: application to Saccharomyces cerevisiae natural strains [J] . Duccio Cavalieri, Irene Stefanini, Luisa Berná, Nucleic acids research . 2012,第9期

机译：在测序基因组中发现高度系统信息学基因的计算管道：在酿酒酵母天然菌株中的应用
3. Highly Informative Fingerprinting of Extra-Virgin Olive Oil Volatiles: The Role of High Concentration-Capacity Sampling in Combination with Comprehensive Two-Dimensional Gas Chromatography [J] . Federico Stilo, Chiara Cordero, Barbara Sgorbini, Chromatography . 2019,第3期

机译：高初榨橄榄油挥发物的高度信息性指纹：高浓度 - 容量采样的作用与综合二维气相色谱相结合
4. Discovering Highly Informative Feature Set over High Dimensions [C] . Zhang Chongsheng, Masseglia Florent, Zhang Xiangliang International Conference on Tools with Artificial Intelligence . 2012

机译：发现高尺寸的高度信息特征
5. Evolutionary computation with noise perturbation and cluster analysis to discover biomarker sets from high dimensional biological data [D] . Mathur, Ravi 2011

机译：具有噪声扰动和聚类分析的进化计算，可从高维生物学数据中发现生物标志物集
6. A computational pipeline to discover highly phylogenetically informative genes in sequenced genomes: application to Saccharomyces cerevisiae natural strains [O] . Matteo Ramazzotti, Luisa Berná, Irene Stefanini, 2012

机译：在序列化的基因组中发现高度系统发育信息基因的计算管道：在酿酒酵母天然菌株中的应用
7. Discovering Highly Informative Feature Set over High Dimensions [O] . F. Masseglia 2012

机译：发现高尺寸的高度信息丰富功能
8. Towards Aspect Invariant Feature Sets for Characterizing Three Dimensional Objects [R] . Desrochers, A. A. 1979

机译：用于表征三维物体的纵横不变特征集

Discovering Highly Informative Feature Set over High Dimensions

摘要

著录项

相似文献

相关主题

期刊订阅