首页> 外文期刊>Bioinformatics >Robust and efficient identification of biomarkers by classifying features on graphs
【24h】

Robust and efficient identification of biomarkers by classifying features on graphs

机译:通过对图形上的特征进行分类,对生物标志物进行可靠,有效的识别

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: A central problem in biomarker discovery from large-scale gene expression or single nucleotide polymorphism (SNP) data is the computational challenge of taking into account the dependence among all the features. Methods that ignore the dependence usually identify non-reproducible biomarkers across independent datasets. We introduce a new graph-based semi-supervised feature classification algorithm to identify discriminative disease markers by learning on bipartite graphs. Our algorithm directly classifies the feature nodes in a bipartite graph as positive, negative or neutral with network propagation to capture the dependence among both samples and features (clinical and genetic variables) by exploring bi-cluster structures in a graph. Two features of our algorithm are: (1) our algorithm can find a global optimal labeling to capture the dependence among all the features and thus, generates highly reproducible results across independent microarray or other high-thoughput datasets, (2) our algorithm is capable of handling hundreds of thousands of features and thus, is particularly useful for biomarker identification from high-throughput gene expression and SNP data. In addition, although designed for classifying features, our algorithm can also simultaneously classify test samples for disease prognosis/diagnosis.Results: We applied the network propagation algorithm to study three large-scale breast cancer datasets. Our algorithm achieved competitive classification performance compared with SVMs and other baseline methods, and identified several markers with clinical or biological relevance with the disease. More importantly, our algorithm also identified highly reproducible marker genes and enriched functions from the independent datasets.
机译:动机:从大规模基因表达或单核苷酸多态性(SNP)数据发现生物标志物的中心问题是考虑到所有特征之间的依赖性的计算挑战。忽略依赖性的方法通常会在独立的数据集中识别出不可重复的生物标记。我们引入了一种新的基于图的半监督特征分类算法,通过学习二部图来识别判别性疾病标记。我们的算法通过网络传播将二部图中的特征节点直接分类为正,负或中性,并通过网络传播来捕获样本和特征(临床和遗传变量)之间的依赖关系,方法是探索图中的双聚类结构。我们算法的两个特征是:(1)我们的算法可以找到一个全局最优标记来捕获所有特征之间的依赖性,从而在独立的微阵列或其他高通量数据集上产生高度可重复的结果,(2)我们的算法能够处理数十万个特征的过程非常有用,因此对于从高通量基因表达和SNP数据进行生物标志物识别特别有用。此外,尽管设计用于分类功能,但我们的算法还可以同时对样本进行分类以进行疾病的预后/诊断。结果:我们将网络传播算法应用于三个大型乳腺癌数据集。与SVM和其他基准方法相比,我们的算法在分类方面具有竞争优势,并确定了与该疾病具有临床或生物学相关性的几种标记。更重要的是,我们的算法还从独立的数据集中识别了高度可重复的标记基因并丰富了功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号