首页> 外文会议>International Conference on Biological Engineering and Pharmacy >A Novel Feature Selection Method for Gene Expression Data Based on Samples Localization
【24h】

A Novel Feature Selection Method for Gene Expression Data Based on Samples Localization

机译:基于样本定位的基因表达数据的新特征选择方法

获取原文

摘要

It is an important and hot topic for researchers to develop an efficient and robust feature selection method from gene expression profile data with thousands of genes and small sample size. At present, most of feature selection methods are constructed models to use all samples of gene expression data, but these methods are never considered the influence of outlier samples and the distribution of samples. Besides, it is well known that cancer is a kind of heterogeneous disease, and different cancer tissue samples of same organs have many different subtypes on molecular characteristics. So, we should select samples with the same genetic characteristics to construct models. Therefore, in this article, we proposed a novel and efficient feature selection approach based on localized samples to extract gene signatures more accurately. We picked out the nearest samples in a certain range for each target sample and obtained the best localized samples by constructing a sample-sample similarity network, which calculated Euclidean distance between the central samples with others by using gene expression values firstly. Secondly, we established the co-expression networks by selecting top nearest samples, and formed steady-state probability network applying to Random Walk with Restart (RWR) method. Finally, through dividing into this network and comparing five selection strategies, we got localized samples for best cancer classification. We applied our method on six datasets across different cancer types. The average accuracies of top 100 genes of the method by SVM classifiers in leave-one-out cross validation (LOOCV) are 95.46%, 94.01%, 96.20%, 99.79%, 99.08% and 99.37%, respectively. The results show that the proposed method obtains excellent performance on these datasets. It also indicates that the proposed method is effective and applicable.
机译:研究人员是一种重要的和热门话题,从具有数千个基因和小样本尺寸的基因表达谱数据开发一种有效和强大的特征选择方法。目前,大多数特征选择方法是构造模型,用于使用所有基因表达数据样本,但这些方法永远不会被认为是异常样本和样品分布的影响。此外,众所周知,癌症是一种异质疾病,同一器官的不同癌症组织样品在分子特征上具有许多不同的亚型。因此,我们应该选择具有相同遗传特性的样本来构建模型。因此,在本文中,我们提出了一种基于局部样本的新颖有效的特征选择方法,以更准确地提取基因签名。我们在每个靶样品中挑选出最近的样品,并通过构建样品 - 样本相似性网络获得最佳局部样本,其通过首先使用基因表达值计算中央样品之间的欧氏距距离。其次,我们通过选择顶部最近的样本来建立了共表达网络,并形成了使用重启(RWR)方法的随机散步的稳态概率网络。最后,通过划分到这个网络并进行比较五种选择策略,我们得到了最佳癌症分类的本地化样本。我们在跨不同癌症类型的六个数据集上应用了我们的方法。 SVM分类器在休假 - 一次性交叉验证(LOOCV)中的方法的平均准确性分别为95.46%,94.01%,96.20%,99.79%,99.08%和99.37%。结果表明,该方法在这些数据集上获得了出色的性能。它还表明该方法是有效和适用的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号