High-dimensional feature selection for genomic datasets

Afshar Majid; Usefi Hamid

首页> 外文期刊>Knowledge-Based Systems >High-dimensional feature selection for genomic datasets

【24h】

High-dimensional feature selection for genomic datasets

机译：基因组数据集的高维特征选择

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A central problem in machine learning and pattern recognition is the process of recognizing the most important features. In this paper, we provide a new feature selection method (DRPT) that consists of first removing the irrelevant features and then detecting correlations between the remaining features. Let D = [A vertical bar b] be a dataset, where b is the class label and A is a matrix whose columns are the features. We solve Ax = b using the least squares method and the pseudo-inverse of A. Each component of x can be viewed as an assigned weight to the corresponding column (feature). We define a threshold based on the local maxima of x and remove those features whose weights are smaller than the threshold. To detect the correlations in the reduced matrix, which we still call A, we consider a perturbation (A) over tilde of A. We prove that correlations are encoded in Delta x =vertical bar x - (x) over tilde vertical bar, where (x) over tilde is the least squares solution of (A) over tilde(x) over tilde = b. We cluster features first based on Delta x and then using the entropy of features. Finally, a feature is selected from each sub-cluster based on its weight and entropy. The effectiveness of DRPT has been verified by performing a series of comparisons with seven state-of-the-art feature selection methods over ten genetic datasets ranging up from 9,117 to 267,604 features. The results show that, over all, the performance of DRPT is favorable in several aspects compared to each feature selection algorithm. (C) 2020 Elsevier B.V. All rights reserved.

机译：机器学习和模式识别中的核心问题是识别最重要的功能的过程。在本文中，我们提供了一种新的特征选择方法（DRPT），其包括首先去除无关的特征，然后检测其余特征之间的相关性。设d = [垂直条b]是数据集，其中b是类标签，a是一个矩阵，其列是特征。我们使用最小二乘法和A的伪逆求解AX = B.可以将X的每个组件视为指定的权重（特征）。我们根据X的局部最大值定义阈值，并删除权重小于阈值的那些特征。为了检测到换算中的相关性，我们仍然呼叫A，我们考虑在TILDE上的扰动（a）。我们证明了在Delta x =垂直条x - （x）上编码的相关性垂直条，其中（x）在波浪上是（a）over tilde = b上的tilde（x）的最小二乘溶液。我们首先基于Delta X然后使用功能的熵来群集功能。最后，基于其权重和熵从每个子集群中选择特征。通过在十个遗传数据集中执行七种最先进的特征选择方法的一系列比较来验证了DTPT的有效性。超过9,117至267,604个功能。结果表明，与每个特征选择算法相比，在几个方面，驱动器的性能有利。（c）2020 Elsevier B.v.保留所有权利。

著录项

来源
《Knowledge-Based Systems》 |2020年第28期|106370.1-106370.11|共11页
作者
Afshar Majid; Usefi Hamid;
展开▼
作者单位

Mem Univ Newfoundland Dept Comp Sci St John NF Canada;

Mem Univ Newfoundland Dept Math & Stat St John NF Canada;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Feature selection; Dimensionality reduction; Perturbation theory; Singular value decomposition; Disease diagnoses; Classification;

机译：特征选择;减少维度;扰动理论;奇异值分解;疾病诊断;分类;

相似文献

外文文献
中文文献
专利

1. A Nested Genetic Algorithm for feature selection in high-dimensional cancer Microarray datasets [J] . Sayed Sabah, Nassef Mohammad, Badr Amr, Expert Systems with Application . 2019,第MAY期

机译：高维癌症微阵列数据集特征选择的嵌套遗传算法
2. Hybrid binary Coral Reefs Optimization algorithm with Simulated Annealing for Feature Selection in high-dimensional biomedical datasets [J] . Yan Chaokun, Ma Jingjing, Luo Huimin, Chemometrics and Intelligent Laboratory Systems . 2019,第期

机译：具有模拟退火的混合二进制珊瑚礁优化算法在高维生物医学数据集中的特征选择
3. A hybrid approach using rough set theory and hypergraph for feature selection on high-dimensional medical datasets [J] . Raman M. R. Gauthama, Nivethitha Somu, Kannan Krithivasan, Soft computing: A fusion of foundations, methodologies and applications . 2019,第23期

机译：一种使用粗糙集理论的混合方法和高维医学数据集特征选择的超图
4. A Centre of Gravity-Based Preprocessing Approach for Feature Selection Using Artificial Bee Colony Algorithm on High-Dimensional Datasets [C] . M. G. Bindu, M. K. Sabu International Conference on Communication Systems and Networks . 2019

机译：基于重力中心的预处理方法，在高维数据集上使用人工蜂群算法进行特征选择
5. Robust and efficient feature selection for high-dimensional datasets. [D] . Mo, Dengyao. 2011

机译：高维数据集的稳健而高效的特征选择。
6. Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets [O] . Muhammad Umar Chaudhry, Muhammad Yasir, Muhammad Nabeel Asghar, 2020

机译：基于蒙特卡罗树搜索的递归算法用于高维数据集中的特征选择
7. Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets [O] . Muhammad Umar Chaudhry, Muhammad Yasir, Muhammad Nabeel Asghar, 2020

机译：基于蒙特卡罗树搜索的递归算法，用于高维数据集中的特征选择

High-dimensional feature selection for genomic datasets

摘要

著录项

相似文献

相关主题

期刊订阅