首页> 外文会议>International Conference on Biological Engineering and Pharmacy >A Novel Feature Selection Method for Gene Expression Data Based on Samples Localization
【24h】

A Novel Feature Selection Method for Gene Expression Data Based on Samples Localization

机译:基于样本定位的基因表达数据的新特征选择方法

获取原文

摘要

It is an important and hot topic for researchers to develop an efficient and robust feature selection method from gene expression profile data with thousands of genes and small sample size. At present, most of feature selection methods are constructed models to use all samples of gene expression data, but these methods are never considered the influence of outlier samples and the distribution of samples. Besides, it is well known that cancer is a kind of heterogeneous disease, and different cancer tissue samples of same organs have many different subtypes on molecular characteristics. So, we should select samples with the same genetic characteristics to construct models. Therefore, in this article, we proposed a novel and efficient feature selection approach based on localized samples to extract gene signatures more accurately. We picked out the nearest samples in a certain range for each target sample and obtained the best localized samples by constructing a sample-sample similarity network, which calculated Euclidean distance between the central samples with others by using gene expression values firstly. Secondly, we established the co-expression networks by selecting top nearest samples, and formed steady-state probability network applying to Random Walk with Restart (RWR) method. Finally, through dividing into this network and comparing five selection strategies, we got localized samples for best cancer classification. We applied our method on six datasets across different cancer types. The average accuracies of top 100 genes of the method by SVM classifiers in leave-one-out cross validation (LOOCV) are 95.46%, 94.01%, 96.20%, 99.79%, 99.08% and 99.37%, respectively. The results show that the proposed method obtains excellent performance on these datasets. It also indicates that the proposed method is effective and applicable.
机译:研究人员是从基因表达谱数据开发具有数千个基因和小样本尺寸的基因表达谱数据的高效和鲁棒特征选择方法的重要课题。目前,大多数特征选择方法都是构建模型以使用所有基因表达数据样本,但这些方法永远不会被认为是异常样本和样品分布的影响。此外,众所周知,癌症是一种异质疾病,而同一器官的不同癌症组织样品在分子特征上具有许多不同的亚型。因此,我们应该选择具有相同遗传特性的样本来构建模型。因此,在本文中,我们提出了一种基于局部样本的新颖有效的特征选择方法,以更准确地提取基因签名。我们在每个靶样品中挑出了最近的样品,并通过构建样品样本相似性网络获得最佳局部样本,其通过首先使用基因表达值计算中央样品之间的欧氏距离。其次,我们通过选择最近的最近的样本来建立了共表达网络,并形成了用重启(RWR)方法随机散步的稳态概率网络。最后,通过将该网络分成并进行比较五种选择策略,我们得到了最佳癌症分类的本地化样本。我们在不同癌症类型的六种数据集上应用了我们的方法。顶部的平均精度100个基因的方法的通过SVM分类在留一法交叉验证(LOOCV)分别是95.46%,94.01%,96.20%,99.79%,99.08%和99.37%。结果表明,该方法在这些数据集中获得了出色的性能。它还表明该方法是有效且适用的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号