...
首页> 外文期刊>GigaScience >GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets
【24h】

GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets

机译:Gigasom.jl:高性能聚类和巨大细胞计数器数据集的可视化

获取原文

摘要

Background The amount of data generated in large clinical and phenotyping studies that use single-cell cytometry is constantly growing. Recent technological advances allow the easy generation of data with hundreds of millions of single-cell data points with 40 parameters, originating from thousands of individual samples. The analysis of that amount of high-dimensional data becomes demanding in both hardware and software of high-performance computational resources. Current software tools often do not scale to the datasets of such size; users are thus forced to downsample the data to bearable sizes, in turn losing accuracy and ability to detect many underlying complex phenomena. Results We present GigaSOM.jl, a fast and scalable implementation of clustering and dimensionality reduction for flow and mass cytometry data. The implementation of GigaSOM.jl in the high-level and high-performance programming language Julia makes it accessible to the scientific community and allows for efficient handling and processing of datasets with billions of data points using distributed computing infrastructures. We describe the design of GigaSOM.jl, measure its performance and horizontal scaling capability, and showcase the functionality on a large dataset from a recent study. Conclusions GigaSOM.jl facilitates the use of commonly available high-performance computing resources to process the largest available datasets within minutes, while producing results of the same quality as the current state-of-art software. Measurements indicate that the performance scales to much larger datasets. The example use on the data from a massive mouse phenotyping effort confirms the applicability of GigaSOM.jl to huge-scale studies.
机译:背景技术在使用单细胞细胞测定法的大临床和表型研究中产生的数据量不断生长。最近的技术进步允许简单地生成具有数亿个单个单元数据点的简单生成数据,其中包含> 40个参数,源自数千个单独的样本。对高性能计算资源的硬件和软件来说,对该量的高维数据量的分析变得如此。当前的软件工具通常不会扩展到这种大小的数据集;因此,用户被迫将数据降低到可忍受的尺寸,反过来减去了检测许多底层复杂现象的准确性和能力。结果我们提出了Gigasom.jl,对流动和质量细胞计数数据的聚类和维数减少的快速和可扩展的实现。在高级和高性能编程语言Julia中的Gigasom.JL的实现使科学界可以访问它,并允许使用分布式计算基础架构的数十亿数据点进行高效处理和处理数据集。我们描述了Gigasom.jl的设计,测量其性能和水平缩放功能,并在最近的一项研究中展示了大型数据集的功能。结论Gigasom.JL促进了使用常用的高性能计算资源在分钟内处理最大的可用数据集,同时产生与当前最先进的软件相同的质量的结果。测量表明性能缩放到更大的数据集。来自大规模鼠标表型化努力的数据的示例用来证实了Gigasom.jl对大规模研究的适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号