Feature selection for robust knowledge discovery from data.

机译：从数据中进行可靠的知识发现的功能选择。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Modern information systems provide the ability to record, store, retrieve, and transmit massive amounts of data, and indeed, such systems have become a routine part of daily life for many people. While computers are adept at handling information, it has long been the goal of the knowledge discovery from data (KDD) community to enable computers to extract meaningful knowledge from this information – knowledge that would be otherwise lost to humans in the sheer volume of data. Feature selection techniques are often used in this discovery process to help combat the “curse of dimensionality,'' or the tremendous sample requirements that occur with high-dimensional data. Traditional feature selection algorithms only considered data sets with many available samples, and where the distribution of samples is assumed to accurately represent the entire population (an unbiased sample set). There are many situations where these assumptions do not hold, causing existing feature selection techniques to break down and in turn prevent automated KDD processes from being used. Each of these situations presents their own unique challenges to feature selection. In this dissertation, I analyze these challenges and develop feature selection algorithms that can perform robustly – specifically on small sample-size problems and in domains where biased data is unavoidable. In small sample-size data, results show that traditional feature selection algorithms produce unstable selected subsets, that is, subset membership will change with perturbations to the sample set. This reduces confidence that the selected subsets are truly relevant to the learning target and casts doubts on any extracted knowledge. Using traditional feature selection techniques in biased data will lead to heavily biased models that are not informative with respect to the entire problem. In dynamic situations where biased data is encountered, such as in reinforcement learning, this can prevent learning from proceeding at all, breaking the KDD process. In this dissertation, I present several techniques for overcoming these issues and demonstrate their effectiveness on diverse applications. Furthermore, I show that methods we developed can significantly outperform state-of-the-art feature selection algorithms in each application.

机译：现代信息系统提供了记录，存储，检索和传输大量数据的能力，实际上，这样的系统已成为许多人日常生活的一部分。尽管计算机擅长处理信息，但是从数据（KDD）社区发现知识的目标一直是使计算机能够从该信息中提取有意义的知识，否则这些知识将在大量数据中丢失给人类。在发现过程中经常使用特征选择技术来帮助应对“维数诅咒”或高维数据出现的巨大样本需求。传统的特征选择算法仅考虑具有许多可用样本的数据集，并且假定样本分布准确地代表了整个总体（无偏样本集）。在许多情况下，这些假设都不成立，从而导致现有的功能选择技术崩溃，进而导致无法使用自动KDD流程。这些情况中的每一种都对特征选择提出了自己独特的挑战。在这篇论文中，我分析了这些挑战，并开发了性能强大的特征选择算法，特别是在样本量较小的问题以及不可避免的有偏差数据的领域中。在小样本数据中，结果表明传统特征选择算法会生成不稳定的选定子集，也就是说，子集成员资格会随着对样本集的扰动而变化。这降低了所选子集与学习目标确实相关的信心，并对任何提取的知识产生了疑问。在有偏见的数据中使用传统的特征选择技术会导致严重偏见的模型，而这些模型对于整个问题并没有提供任何信息。在遇到有偏见的数据的动态情况下（例如在强化学习中），这可能完全阻止学习继续进行，从而破坏了KDD流程。在本文中，我提出了几种克服这些问题的技术，并展示了它们在各种应用中的有效性。此外，我证明了我们开发的方法在每种应用中都可以大大胜过最新的特征选择算法。

著录项

作者
Loscalzo, Steven.;
展开▼
作者单位

State University of New York at Binghamton.;

展开▼
授予单位 State University of New York at Binghamton.;
学科 Computer Science.
学位 Ph.D.
年度 2012
页码 186 p.
总页数 186
原文格式 PDF
正文语种 eng
中图分类水产、渔业;
关键词

相似文献

外文文献
中文文献
专利

1. Facial appearance and texture feature-based robust facial expression recognition framework for sentiment knowledge discovery [J] . Muhammad Sajjad, Adnan Shah, Zahoor Jan, Cluster computing . 2018,第1期

机译：基于面部的外观和纹理特征的鲁棒面部表情识别框架，具有情感知识发现
2. Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods [J] . Zhang Zishuang, Liu Zhi-Ping BMC Medical Genomics . 2021,第1期

机译：通过多种特征选择方法从高吞吐量数据发现肝细胞癌的鲁棒生物标志物发现
3. General method for automated feature extraction and selection and its application for gender classification and biomechanical knowledge discovery of sex differences in spinal posture during stance and gait [J] . Dindorf Carlo, Konradi Jurgen, Wolf Claudia, Computer methods in biomechanics and biomedical engineering . 2021,第1a4期

机译：自动特征提取与选择的一般方法及其在姿态和步态中脊柱姿势性别差异的性别分类和生物力学知识发现及其应用
4. Feature Selection Methods on Biological Knowledge Discovery and Data Mining: A Survey [C] . Mhamdi Hanen, Mhamdi Faouzi International workshop on database and expert systems applications . 2014

机译：生物知识发现与数据挖掘的特征选择方法研究
5. Theoretical advances in robust optimization, feature selection, and biomarker discovery. [D] . Guzman, Yannis Antonio. 2016

机译：鲁棒性优化，特征选择和生物标记发现的理论进展。
6. Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery [O] . Ignacio Ponzoni, Víctor Sebastián-Pérez, Carlos Requena-Triguero, -1

机译：QSAR建模中用于药物发现的混合特征选择和特征学习方法
7. A robust and accurate method for feature selection and prioritization from multi-class OMICs data. [O] . Vittorio Fortino, Pia Kinaret, Nanna Fyhrquist, 2014

机译：一种强大而准确的方法，用于从多类OmIC数据中进行特征选择和优先级排序。
8. Knowledge Discovery from Massive Healthcare Claims Data. [R] . Chandola, V., Sukumar, S. R., Schryver, J. 2013

机译：大规模医疗索赔数据的知识发现。

Feature selection for robust knowledge discovery from data.

摘要

著录项

相似文献

相关主题

期刊订阅