首页> 外文期刊>Biomedical and Health Informatics, IEEE Journal of >Graph-Based Hub Gene Selection Technique Using Protein Interaction Information: Application to Sample Classification
【24h】

Graph-Based Hub Gene Selection Technique Using Protein Interaction Information: Application to Sample Classification

机译:蛋白相互作用信息的基于图的集线器基因选择技术:在样品分类中的应用

获取原文
获取原文并翻译 | 示例
       

摘要

Classification of samples of gene expression profile plays a significant role in prediction and diagnosis of diseases. In the task of sample classification, a robust feature selection algorithm is very much essential to identify the important genes from the high dimensional gene expression data. This paper explores the information of protein-protein interaction with a graph mining technique for finding a proper subset of features (genes), which further takes part in sample classification. Here, our contribution for feature selection is three-fold: first, all the genes are grouped into different clusters based on the integrated information of the gene expression values and their protein interactions using a multi-objective optimization based clustering approach. Second, the confidence scores of the protein interactions are incorporated in a popular graph mining algorithm namely Goldberg algorithm to find out the relevant features. These features are the topologically and functionally significant genes, named as hub genes. Finally, these hub genes are identified varying the degrees of the nodes, and those are utilized for the sample classification task. Different machine learning classifiers are exploited for this purpose, and the classification performance is measured with respect to various performance metrics namely accuracy, sensitivity, specificity, precision, F-measure, and Mathews coefficient correlation. Comparative analysis with respect to two baselines and several existing approaches proves the efficiency of the proposed approach. Furthermore, the robustness of the identified hub-gene modules is endorsed using some strong biological significance analysis.
机译:基因表达谱样品的分类在疾病的预测和诊断中起着重要作用。在样本分类任务中,强大的特征选择算法对于从高维基因表达数据中识别重要基因非常重要。本文使用图挖掘技术探索蛋白质间相互作用的信息,以寻找适当的特征(基因)子集,该子集进一步参与样本分类。在这里,我们对特征选择的贡献是三方面的:首先,基于基因表达值及其蛋白质相互作用的综合信息,使用基于多目标优化的聚类方法将所有基因分为不同的聚类。其次,将蛋白质相互作用的置信度分数结合到流行的图形挖掘算法(即Goldberg算法)中,以找出相关特征。这些特征是拓扑上和功能上重要的基因,称为集线器基因。最后,通过改变节点的程度来识别这些中心基因,并将这些基因用于样本分类任务。为此目的,使用了不同的机器学习分类器,并且针对各种性能指标(即准确性,敏感性,特异性,精度,F量度和Mathews系数相关性)对分类性能进行了测量。关于两个基准和几种现有方法的比较分析证明了该方法的有效性。此外,使用一些强大的生物学意义分析来证明所鉴定的中心基因模块的鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号