GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets

Kratochvíl Miroslav; Hunewald Oliver; Heirendt Laurent; Verissimo Vasco; Vondrá?ek Ji?í; Satagopam Venkata P; Schneider Reinhard; Trefois Christophe; Ollert Markus

首页> 外文期刊>GigaScience >GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets

【24h】

GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets

机译：Gigasom.jl：高性能聚类和巨大细胞计数器数据集的可视化

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background The amount of data generated in large clinical and phenotyping studies that use single-cell cytometry is constantly growing. Recent technological advances allow the easy generation of data with hundreds of millions of single-cell data points with 40 parameters, originating from thousands of individual samples. The analysis of that amount of high-dimensional data becomes demanding in both hardware and software of high-performance computational resources. Current software tools often do not scale to the datasets of such size; users are thus forced to downsample the data to bearable sizes, in turn losing accuracy and ability to detect many underlying complex phenomena. Results We present GigaSOM.jl, a fast and scalable implementation of clustering and dimensionality reduction for flow and mass cytometry data. The implementation of GigaSOM.jl in the high-level and high-performance programming language Julia makes it accessible to the scientific community and allows for efficient handling and processing of datasets with billions of data points using distributed computing infrastructures. We describe the design of GigaSOM.jl, measure its performance and horizontal scaling capability, and showcase the functionality on a large dataset from a recent study. Conclusions GigaSOM.jl facilitates the use of commonly available high-performance computing resources to process the largest available datasets within minutes, while producing results of the same quality as the current state-of-art software. Measurements indicate that the performance scales to much larger datasets. The example use on the data from a massive mouse phenotyping effort confirms the applicability of GigaSOM.jl to huge-scale studies.

机译：背景技术在使用单细胞细胞测定法的大临床和表型研究中产生的数据量不断生长。最近的技术进步允许简单地生成具有数亿个单个单元数据点的简单生成数据，其中包含> 40个参数，源自数千个单独的样本。对高性能计算资源的硬件和软件来说，对该量的高维数据量的分析变得如此。当前的软件工具通常不会扩展到这种大小的数据集;因此，用户被迫将数据降低到可忍受的尺寸，反过来减去了检测许多底层复杂现象的准确性和能力。结果我们提出了Gigasom.jl，对流动和质量细胞计数数据的聚类和维数减少的快速和可扩展的实现。在高级和高性能编程语言Julia中的Gigasom.JL的实现使科学界可以访问它，并允许使用分布式计算基础架构的数十亿数据点进行高效处理和处理数据集。我们描述了Gigasom.jl的设计，测量其性能和水平缩放功能，并在最近的一项研究中展示了大型数据集的功能。结论Gigasom.JL促进了使用常用的高性能计算资源在分钟内处理最大的可用数据集，同时产生与当前最先进的软件相同的质量的结果。测量表明性能缩放到更大的数据集。来自大规模鼠标表型化努力的数据的示例用来证实了Gigasom.jl对大规模研究的适用性。

著录项

来源
《GigaScience 》 |2020年第11期| 共8页
作者
Kratochvíl Miroslav; Hunewald Oliver; Heirendt Laurent; Verissimo Vasco; Vondrá?ek Ji?í; Satagopam Venkata P; Schneider Reinhard; Trefois Christophe; Ollert Markus;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
high-performance computingsingle-cell cytometryself-organizing mapsclusteringdimensionality reductionJulia;

机译：高性能计算灵细胞细胞杂细胞术组织MapsclusteringDimensions减少julia;

相似文献

外文文献
中文文献
专利

1. Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space. [J] . Loewenstein Y, Portugaly E, Fromer M, Bioinformatics . 2008 ,第13期

机译：高效的算法，可对庞大的数据集进行精确的层次聚类：处理整个蛋白质空间。
2. Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space [J] . Yaniv Loewenstein, Elon Portugaly, Menachem Fromer, Bioinformatics . 2008 ,第13期

机译：高效的算法，可对庞大的数据集进行精确的层次聚类：处理整个蛋白质空间
3. SWIFT—Scalable Clustering for Automated Identification of Rare Cell Populations in Large, High-Dimensional Flow Cytometry Datasets, Part 2: Biological Evaluation [J] . Tim R. Mosmann, Iftekhar Naim, Jonathan Rebhahn, Cytometry, Part A: the journal of the International Society for Analytical Cytology . 2014 ,第5期

机译：SWIFT-可扩展的聚类，用于自动识别大型高维流式细胞术数据集中的稀有细胞群体，第2部分：生物评估
4. DRAGONFLY-INTERACTIVE VISUALIZATION OF HUGE AERIAL IMAGE DATASETS [C] . B.Reitinger, M.Hoefler, A.Lengauer, 第21届国际摄影测量与遥感大会(ISPRS 2008)论文集 . 2008

机译：巨大的航空影像数据的龙互动交互可视化
5. Supervised precision ordinal clustering – A human-machine learning algorithm to create accurate clusters in big datasets: Application to indiana water quality data with novel visualization techniques [D] . Singh, Sarabjit 2014

机译：有监督的有序序数聚类–一种人机学习算法，可在大型数据集中创建准确的聚类：采用新颖的可视化技术应用于印第安纳州水质数据
6. GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets [O] . Miroslav Kratochvíl, Oliver Hunewald, Laurent Heirendt, 2020

机译：Gigasom.jl：高性能聚类和巨大细胞计数器数据集的可视化
7. GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets [O] . Miroslav Kratochvíl, Oliver Hunewald, Laurent Heirendt, 2020

机译：Gigasom.jl：高性能聚类和巨大细胞计数器数据集的可视化

GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets

摘要

著录项

相似文献

相关主题

期刊订阅