A Novel Feature Selection Method for Gene Expression Data Based on Samples Localization

机译：基于样本定位的基因表达数据的新特征选择方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

It is an important and hot topic for researchers to develop an efficient and robust feature selection method from gene expression profile data with thousands of genes and small sample size. At present, most of feature selection methods are constructed models to use all samples of gene expression data, but these methods are never considered the influence of outlier samples and the distribution of samples. Besides, it is well known that cancer is a kind of heterogeneous disease, and different cancer tissue samples of same organs have many different subtypes on molecular characteristics. So, we should select samples with the same genetic characteristics to construct models. Therefore, in this article, we proposed a novel and efficient feature selection approach based on localized samples to extract gene signatures more accurately. We picked out the nearest samples in a certain range for each target sample and obtained the best localized samples by constructing a sample-sample similarity network, which calculated Euclidean distance between the central samples with others by using gene expression values firstly. Secondly, we established the co-expression networks by selecting top nearest samples, and formed steady-state probability network applying to Random Walk with Restart (RWR) method. Finally, through dividing into this network and comparing five selection strategies, we got localized samples for best cancer classification. We applied our method on six datasets across different cancer types. The average accuracies of top 100 genes of the method by SVM classifiers in leave-one-out cross validation (LOOCV) are 95.46%, 94.01%, 96.20%, 99.79%, 99.08% and 99.37%, respectively. The results show that the proposed method obtains excellent performance on these datasets. It also indicates that the proposed method is effective and applicable.

机译：研究人员是从基因表达谱数据开发具有数千个基因和小样本尺寸的基因表达谱数据的高效和鲁棒特征选择方法的重要课题。目前，大多数特征选择方法都是构建模型以使用所有基因表达数据样本，但这些方法永远不会被认为是异常样本和样品分布的影响。此外，众所周知，癌症是一种异质疾病，而同一器官的不同癌症组织样品在分子特征上具有许多不同的亚型。因此，我们应该选择具有相同遗传特性的样本来构建模型。因此，在本文中，我们提出了一种基于局部样本的新颖有效的特征选择方法，以更准确地提取基因签名。我们在每个靶样品中挑出了最近的样品，并通过构建样品样本相似性网络获得最佳局部样本，其通过首先使用基因表达值计算中央样品之间的欧氏距离。其次，我们通过选择最近的最近的样本来建立了共表达网络，并形成了用重启（RWR）方法随机散步的稳态概率网络。最后，通过将该网络分成并进行比较五种选择策略，我们得到了最佳癌症分类的本地化样本。我们在不同癌症类型的六种数据集上应用了我们的方法。顶部的平均精度100个基因的方法的通过SVM分类在留一法交叉验证（LOOCV）分别是95.46％，94.01％，96.20％，99.79％，99.08％和99.37％。结果表明，该方法在这些数据集中获得了出色的性能。它还表明该方法是有效且适用的方法。

著录项

来源
《International Conference on Biological Engineering and Pharmacy》|2017年|411p|共6页
会议地点
作者
Mingyue SHENG; Yuan TIAN; Wei DU; Yanchun LIANG;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 Q81-53;
关键词
Feature selection; Samples localization; Cancer classification;

机译：特征选择;样品本地化;癌症分类;

相似文献

外文文献
中文文献
专利

1. A method of dual-process sample selection for feature selection on gene expression data [J] . Quanjin Liu, Zhimin Zhao, Ying-xin Li, International Journal of Physical Sciences . 2013,第17期

机译：基因表达数据特征选择的双过程样本选择方法
2. Microarray gene-expression data classification using less gene expressions by combining feature selection methods and classifiers [J] . Aarti Bhalla, R. K. Agrawal International Journal of Information Engineering and Electronic Business . 2013,第5期

机译：结合特征选择方法和分类器，使用较少的基因表达进行微阵列基因表达数据分类
3. Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. [J] . Li L, Weinberg CR, Darden TA, Bioinformatics . 2001,第12期

机译：基于基因表达数据的样本分类基因选择：研究对GA / KNN方法参数选择的敏感性。
4. A Novel Feature Selection Method for Gene Expression Data Based on Samples Localization [C] . Mingyue SHENG, Yuan TIAN, Wei DU, International Conference on Biological Engineering and Pharmacy . 2017

机译：基于样本定位的基因表达数据的新特征选择方法
5. Feature selection in microarray gene expression data analysis and contagious viral strain computational genotyping [D] . Cai, Zhipeng 2008

机译：微阵列基因表达数据分析和传染性病毒株计算基因分型的特征选择
6. Inference of Genetic Networks From Time-Series and Static Gene Expression Data: Combining a Random-Forest-Based Inference Method With Feature Selection Methods [O] . Shuhei Kimura, Ryo Fukutomi, Masato Tokuhisa, 2020

机译：从时间序列和静态基因表达数据引起遗传网络：组合具有特征选择方法的随机林类推断方法
7. Microarray Gene-expression Data Classification using LessGene Expressions by Combining Feature Selection Methods and Classifiers [O] . AartiBhalla, R. K. Agrawal 2013

机译：通过组合特征选择方法和分类器使用LessGene表达式进行微阵列基因表达数据分类

A Novel Feature Selection Method for Gene Expression Data Based on Samples Localization

摘要

著录项

相似文献

相关主题

期刊订阅