...
首页> 外文期刊>Procedia Computer Science >K-means Clustering and Principal Components Analysis of Microarray Data of L1000 Landmark Genes
【24h】

K-means Clustering and Principal Components Analysis of Microarray Data of L1000 Landmark Genes

机译:K-Means L1000地标基因微阵列数据的聚类和主要成分分析

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Dimensionality reduction methods such as principal component analysis (PCA) are used to select relevant features, and k-means clustering performs well when applied to data with low effective dimensionality. This study integrated PCA and k-means clustering using the L1000 dataset, containing gene microarray data from 978 landmark genes, which have been previously shown to predict expression of ~81% of the remaining 21,290 target genes with low error. Groups within the L1000 dataset were characterized using both microarray and clinical metadata to assess whether 978 landmark genes would improve clustering results, compared to a random set of 978 genes. The role of clinical variables, including morphological diagnosis, were assessed across k-means clustering groups within homogeneous tissue samples in the L1000 dataset. Results show that the 978 landmark genes better differentiated k-means clusters, relative to 978 randomly selected non-landmark genes. K-means clusters generated from the landmark genes showed more separation of cluster groups when plotted against the first two principal components, which capture a greater proportion of variation for the 978 landmark genes. These results suggest that the 978 landmark genes better represent the overall genetic profile of these heterogeneous samples. Future studies will implement predictive analytics techniques to further investigate the interaction of microarray data and clinical variables such as cancer stage.
机译:诸如主成分分析(PCA)之类的维度减少方法用于选择相关的特征,并且K-Means聚类在具有低有效维度的数据应用于数据时执行良好。该研究使用来自978个地标基因的L1000数据集进行了使用L1000 DataSet集成了PCA和K-Means聚类,该数据集预先预测了预测〜81%的剩余21,290个靶基因的表达,其剩余的21,290个靶基因低。使用微阵列和临床元数据表征L1000数据集中的组,以评估978个地标基因是否会改善聚类结果,而与随机的978个基因相比。临床变量(包括形态诊断)的作用在L1000数据集中的均匀组织样本中的K-means聚类组中评估了k-means聚类组。结果表明,相对于978个随机选择的非地标基因,978个地标基因更好地分化的K均值簇。当绘制到前两个主要成分时,从地标基因产生的K-Mean意味着从地标基因产生的簇显示出更多的簇组分离,这捕获了978个地标基因的更大比例的变化。这些结果表明,978个地标基因更好地代表这些异质样品的整体遗传概况。未来的研究将实施预测分析技术,以进一步研究微阵列数据和临床变量如癌症阶段的相互作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号