...
首页> 外文期刊>Journal of computational biology: A journal of computational molecular cell biology >Structure-Aware Principal Component Analysis for Single-Cell RNA-seq Data
【24h】

Structure-Aware Principal Component Analysis for Single-Cell RNA-seq Data

机译:单细胞RNA-SEQ数据的结构感知主成分分析

获取原文
获取原文并翻译 | 示例
           

摘要

With the emergence of droplet-based technologies, it has now become possible to profile transcriptomes of several thousands of cells in a day. Although such a large single-cell cohort may favor the discovery of cellular heterogeneity, it also brings new challenges in the prediction of minority cell types. Identification of any minority cell type holds a special significance in knowledge discovery. In the analysis of single-cell expression data, the use of principal component analysis (PCA) is surprisingly frequent for dimension reduction. The principal directions obtained from PCA are usually dominated by the major cell types in the concerned tissue. Thus, it is very likely that using a traditional PCA may endanger the discovery of minority populations. To this end, we propose locality-sensitive PCA (LSPCA), a scalable variant of PCA equipped with structure-aware data sampling at its core. Structure-aware sampling provides PCA with a neutral spread of the data, thereby reducing the bias in its principal directions arising from the redundant samples in a data set. We benchmarked the performance of the proposed method on ten publicly available single-cell expression data sets including one very large annotated data set. Results have been compared with traditional PCA and PCA with random sampling. Clustering results on the annotated data sets also show that LSPCA can detect the minority populations with a higher accuracy.
机译:随着基于液滴技术的出现,现在可以在一天内概述几千个细胞的转录组。虽然这种大型单细胞队列可能有利于发现细胞异质性,但它也会在少数群体类型的预测中带来了新的挑战。鉴定任何少数群体类型在知识发现中具有特殊意义。在分析单细胞表达数据中,主要成分分析(PCA)的使用令人惊讶的是减少尺寸的频繁。从PCA获得的主要方向通常由有关组织中的主要细胞类型主导。因此,很可能使用传统的PCA可以危及少数民族人群的发现。为此,我们提出了位置敏感的PCA(LSPCA),PCA的可扩展变体,配备了其核心的结构感知数据采样。结构感知采样提供了具有数据的中性扩展的PCA,从而降低了从数据集中的冗余样本引起的主要方向的偏置。我们在十个公开可用的单小区表达式数据集中基准测试所提出的方法,包括一个非常大的注释数据集。结果与随机抽样的传统PCA和PCA进行了比较。在注释数据集上的聚类结果还表明LSPCA可以以更高的准确度检测少数人群。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号