...
首页> 外文期刊>Molecular Systems Biology >scClassify: sample size estimation and multiscale classification of cells using single and multiple reference
【24h】

scClassify: sample size estimation and multiscale classification of cells using single and multiple reference

机译:Scclassify:使用单一和多个参考的细胞的样本量估计和多尺度分类

获取原文
           

摘要

Automated cell type identification is a key computational challenge in single‐cell RNA‐sequencing (scRNA‐seq) data. To capitalise on the large collection of well‐annotated scRNA‐seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell type hierarchies constructed from single or multiple annotated datasets as references. scClassify enables the estimation of sample size required for accurate classification of cell types in a cell type hierarchy and allows joint classification of cells when multiple references are available. We show that scClassify consistently performs better than other supervised cell type classification methods across 114 pairs of reference and testing data, representing a diverse combination of sizes, technologies and levels of complexity, and further demonstrate the unique components of scClassify through simulations and compendia of experimental datasets. Finally, we demonstrate the scalability of scClassify on large single‐cell atlases and highlight a novel application of identifying subpopulations of cells from the Tabula Muris data that were unidentified in the original publication. Together, scClassify represents state‐of‐the‐art methodology in automated cell type identification from scRNA‐seq data. Synopsis scClassify is a multiscale classification framework based on ensemble learning and cell type hierarchies, enabling sample size estimation required for accurate cell type classification and joint classification of cells using multiple references. scClassify performs multiscale cell type classification based on cell type hierarchies constructed from single or multiple reference datasets. It implements a post‐hoc clustering procedure for discovering novel cell types from cells that are unassigned due to the absence of their types in the reference data. It enables the estimation of the number of cells required in a reference dataset to accurately discriminate a given cell type in a cell type hierarchy. Application to large atlas datasets such as Tabula Muris demonstrates its ability to refine cell types and identify cells from sub‐populations.
机译:自动细胞类型识别是单细胞RNA测序(SCRNA-SEQ)数据中的关键计算挑战。为了利用大量注释的Scrna-SEQ数据集,我们开发了Scclassify,一种基于集合学习的多尺度分类框架和从单个注释的数据集构成的集合学习和单元格式层次组成作为参考。 ScclAssify可以估计细胞类型层次结构中精确分类单元格类型所需的样本大小,并且在多个参考可用时允许细胞的联合分类。我们表明ScclAssify始终如一地表现出超过114对参考和测试数据的其他监督小区类型分类方法,代表尺寸,技术和复杂程度的不同组合,并进一步展示了通过模拟和实验组成的Scclasify的独特组成部分数据集。最后,我们展示了Scclasify对大型单细胞地图集的可扩展性,并突出了鉴定在原始出版物中未识别的塔巴拉默比数据鉴定细胞群的新型应用。 Scclassify在一起代表了来自ScrNA-SEQ数据的自动细胞类型识别的最新方法。 Sycopsis ScclAssify是基于集合学习和单元格式层次结构的多尺度分类框架,使得使用多个引用的精确小区类型分类和细胞的联合分类所需的示例大小估计。 Scclassify根据从单个或多个参考数据集构建的单元格类型层次进行多尺度单元格式分类。它实现了一个hoc聚类过程,用于发现由于在参考数据中没有类型的类型而非分配的小区中的小区类型。它能够估计参考数据集中所需的小区数,以精确地区分小区类型层次结构中的给定小区类型。应用于大型塔卢斯数据集如塔布拉默里斯,证明了能力优化细胞类型并识别子群体中的细胞。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号