首页> 外文期刊>PLoS Computational Biology >coupleCoC+: An information-theoretic co-clustering-based transfer learning framework for the integrative analysis of single-cell genomic data
【24h】

coupleCoC+: An information-theoretic co-clustering-based transfer learning framework for the integrative analysis of single-cell genomic data

机译:耦合+:一种基于信息的基于信息的共聚类转移学习框架,用于单细胞基因组数据的整合分析

获取原文
       

摘要

Technological advances have enabled us to profile multiple molecular layers at unprecedented single-cell resolution and the available datasets from multiple samples or domains are growing. These datasets, including scRNA-seq data, scATAC-seq data and sc-methylation data, usually have different powers in identifying the unknown cell types through clustering. So, methods that integrate multiple datasets can potentially lead to a better clustering performance. Here we propose coupleCoC+ for the integrative analysis of single-cell genomic data. coupleCoC+ is a transfer learning method based on the information-theoretic co-clustering framework. In coupleCoC+, we utilize the information in one dataset, the source data, to facilitate the analysis of another dataset, the target data. coupleCoC+ uses the linked features in the two datasets for effective knowledge transfer, and it also uses the information of the features in the target data that are unlinked with the source data. In addition, coupleCoC+ matches similar cell types across the source data and the target data. By applying coupleCoC+ to the integrative clustering of mouse cortex scATAC-seq data and scRNA-seq data, mouse and human scRNA-seq data, mouse cortex sc-methylation and scRNA-seq data, and human blood dendritic cells scRNA-seq data from two batches, we demonstrate that coupleCoC+ improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. coupleCoC+ has fast convergence and it is computationally efficient. The software is available at https://github.com/cuhklinlab/coupleCoC_plus.
机译:技术进步使我们能够以前所未有的单个小区分辨率(多个样本或域的可用数据集)概括多个分子层。这些数据集包括SCRNA-SEQ数据,Scatac-SEQ数据和SC-甲基化数据,通常具有通过聚类识别未知小区类型的不同功率。因此,集成多个数据集的方法可能会导致更好的聚类性能。在这里,我们提出偶联+用于单细胞基因组数据的整合分析。耦合+是基于信息理论共聚类框架的传输学习方法。在耦合+中,我们利用一个数据集中的信息,源数据,以便于分析另一个数据集,目标数据。 inexecoc +使用两个数据集中的链接功能以实现有效的知识传输,并且还使用与源数据不链接的目标数据中的特征中的信息。此外,inexecoc +匹配源数据和目标数据的类似单元格类型。通过将偶联偶联偶联对小鼠Cortex Scatac-SEQ数据和ScRNA-SEQ数据,鼠标和人ScRNA-SEQ数据,小鼠皮质SC-甲基化和ScRNA-SEQ数据的综合聚类,以及来自两个批次,我们证明偶联+改善了整体聚类性能,并匹配多模式单细胞基因组数据集的细胞群。耦合+具有快速收敛性,并且它是计算效率。该软件可在https://github.com/cuhklinlab/couplecoc_plus中获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号