首页> 外文会议>Asia Pacific Bioinformatics Conference >Ultrafast clustering of single-cell flow cytometry data using FlowGrid
【24h】

Ultrafast clustering of single-cell flow cytometry data using FlowGrid

机译:使用流线流体的单细胞流式细胞仪数据超快聚类

获取原文

摘要

Background: Flow cytometry is a popular technology for quantitative single-cell profiling of cell surface markers. It enables expression measurement of tens of cell surface protein markers in millions of single cells. It is a powerful tool for discovering cell sub-populations and quantifying cell population heterogeneity. Traditionally, scientists use manual gating to identify cell types, but the process is subjective and is not effective for large multidimensional data. Many clustering algorithms have been developed to analyse these data but most of them are not scalable to very large data sets with more than ten million cells.Results: Here, we present a new clustering algorithm that combines the advantages of density-based clustering algorithm DBSCAN with the scalability of grid-based clustering. This new clustering algorithm is implemented in python as an open source package, FlowGrid. FlowGrid is memory efficient and scales linearly with respect to the number of cells. We have evaluated the performance of FlowGrid against other state-of-the-art clustering programs and found that FlowGrid produces similar clustering results but with substantially less time. For example, FlowGrid is able to complete a clustering task on a data set of 23.6 million cells in less than 12 seconds, while other algorithms take more than 500 seconds or get into error.Conclusions: FlowGrid is an ultrafast clustering algorithm for large single-cell flow cytometry data. The source code is available at https://github.com/VCCRI/FlowGrid.
机译:背景:流式细胞仪是一种普遍的细胞表面标志物的单细胞分析的普遍技术。它能够在数百万单细胞中表达数十个细胞表面蛋白标记物的测量。它是一种用于发现细胞子群和量化细胞群异质性的强大工具。传统上,科学家使用手动门控来识别细胞类型,但过程是主观的,对于大型多维数据而言无效。已经开发了许多聚类算法来分析这些数据,但大多数情况下都不扩展到具有超过一百万个细胞的非常大的数据集。结果:在这里,我们提出了一种新的聚类算法,它结合了基于密度的聚类算法DBSCAN的优势随着基于网格的聚类的可扩展性。这种新的聚类算法在Python中实现为开源包,流程格格。 FlowGrid是存储器高效,并相对于小区的数量线性缩放。我们已经评估了FlowGrid对其他最先进的聚类程序的性能,发现流程格生成了类似的聚类结果,但时间大量较少。例如,FlowGrid能够在不到12秒的时间内完成23.6百万个单元格的数据集的聚类任务,而其他算法需要超过500秒或进入错误。链接:FlowGrid是大型单次超快聚类算法细胞流式细胞术数据。源代码可在https://github.com/vccri/flulgrid中获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号