首页> 外文会议>International Conference on Pattern Recognition in Bioinformatics >Semi-Supervised Graph Embedding Scheme with Active Learning (SSGEAL): Classifying High Dimensional Biomedical Data
【24h】

Semi-Supervised Graph Embedding Scheme with Active Learning (SSGEAL): Classifying High Dimensional Biomedical Data

机译:具有主动学习(SSGEAL)的半监控图形嵌入方案:分类高维生物医学数据

获取原文

摘要

In this paper, we present a new dimensionality reduction (DR) method (SSGEAL) which integrates Graph Embedding (GE) with semi-supervised and active learning to provide a low dimensional data representation that allows for better class separation. Unsupervised DR methods such as Principal Component Analysis and GE have previously been applied to the classification of high dimensional biomedical datasets (e.g. DNA microarrays and digitized histopathology) in the reduced dimensional space. However, these methods do not incorporate class label information, often leading to embeddings with significant overlap between the data classes. Semi-supervised dimensionality reduction (SSDR) methods have recently been proposed which utilize both labeled and un-labeled instances for learning the optimal low dimensional embedding. However, in several problems involving biomedical data, obtaining class labels may be difficult and/or expensive. SSGEAL utilizes labels-from instances, identified as "hard to classify" by a support vector machine based active learning algorithm, to drive an updated SSDR scheme while reducing labeling cost. Real world biomedical data from 7 gene expression studies and 3900 digitized images of prostate cancer needle biopsies were used to show the superior performance of SSGEAL compared to both GE and SSAGE (a recently popular SSDR method) in terms of both the Silhouette Index (SI) (SI = 0.35 for GE, SI = 0.31 for SSAGE, and SI = 0.50 for SSGEAL) and the Area Under the Receiver Operating Characteristic Curve (AUC) for a Random Forest classifier (AUC = 0.85 for GE, AUC = 0.93 for SSAGE, AUC = 0.94 for SSGEAL).
机译:在本文中,我们提出了一种新的维数降低(DR)方法(SSGEAL),该图嵌入(GE)与半监督和主动学习集成以提供低维数据表示,其允许更好的类分离。无监督DR方法如主成分分析和GE先前已经应用到在缩小三维空间高维数据集生物医学(例如DNA微阵列和数字化的病理组织学)的分类。然而,这些方法不包含类的标签信息,往往导致的嵌入与数据类之间显著的重叠。最近提出了半监督降维(SSDR)方法,其利用标记的和未标记的实例用于学习的最佳低维嵌入。然而,在一些问题涉及生物医学数据,取得等级的标签可能是困难和/或昂贵的。 SSGEAL利用标签-从实例中,标识为“硬分类”通过支持向量机基于主动学习算法,以驱动一个更新SSDR方案,同时降低成本的标签。从7个基因表达研究与前列腺癌穿刺活检的3900个数字化图像真实世界的生物医学数据被用来显示SSGEAL的方面优越的性能相比,GE和SSAGE(最近流行的SSDR法)两种剪影指数(SI) (SI = 0.35为GE,SI = 0.31为SSAGE,和SI = 0.50 SSGEAL)下面积的接受者操作特性曲线(AUC)用于随机森林分类器(AUC = 0.85 GE,AUC = 0.93 SSAGE, AUC = 0.94 SSGEAL)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号