首页> 外文期刊>Analytical chemistry >Memory Efficient Principal Component Analysis for the Dimensionality Reduction of Large Mass Spectrometry Imaging Data Sets
【24h】

Memory Efficient Principal Component Analysis for the Dimensionality Reduction of Large Mass Spectrometry Imaging Data Sets

机译:大型质谱成像数据集降维的内存有效主成分分析

获取原文
获取原文并翻译 | 示例
       

摘要

A memory efficient algorithm for the computation of principal component analysis (PCA) of large mass spectrometry imaging data sets is presented. Mass spectrometry imaging (MSI) enables two- and three-dimensional overviews of hundreds of unlabeled molecular species in complex samples such as intact tissue. PCA, in combination with data binning or other reduction algorithms, has been widely used in the unsupervised processing of MSI data and as a dimentionality reduction method prior to clustering and spatial segmentation. Standard implementations of PCA require the data to be stored in random access memory. This imposes an upper limit on the amount of data that can be processed, necessitating a compromise between the number of pixels and the number of peaks to include. With increasing interest in multivariate analysis of large 3D multislice data sets and ongoing improvements in instrumentation, the ability to retain all pixels and many more peaks is increasingly important. We present a new method which has no limitation on the number of pixels and allows an increased number of peaks to be retained. The new technique was validated against the MATLAB (The MathWorks Inc., Natick, Massachusetts) implementation of PCA (princomp) and then used to reduce, without discarding peaks or pixels, multiple serial sections acquired from a single mouse brain which was too large to be analyzed with princomp. Then, k-means clustering was performed on the reduced data set. We further demonstrate with simulated data of 83 slices, comprising 20 535 pixels per slice and equaling 44 GB of data, that the new method can be used in combination with existing tools to process an entire organ. MATLAB code implementing the memory efficient PCA algorithm is provided.
机译:提出了一种用于大型质谱成像数据集主成分分析(PCA)计算的高效存储算法。质谱成像(MSI)可以对复杂样本(例如完整组织)中的数百种未标记分子种类进行二维和三维概览。 PCA与数据合并或其他归约算法相结合,已广泛用于MSI数据的无监督处理中,并作为聚类和空间分割之前的二维归约方法。 PCA的标准实现要求将数据存储在随机存取存储器中。这对可处理的数据量施加了上限,因此必须在像素数和要包含的峰数之间进行折衷。随着人们对大型3D多层数据集的多变量分析的兴趣日益增加,并且仪器的不断改进,保留所有像素和更多峰的能力变得越来越重要。我们提出了一种新方法,该方法对像素数量没有限制,并且可以保留增加数量的峰值。这项新技术已针对PCA(princomp)的MATLAB(The MathWorks Inc.,Natick,Massachusetts)实施进行了验证,然后用于减少(而不丢掉峰或像素)从单个鼠标大脑获取的多个串行部分的情况,而这些图像太大而无法用princomp分析。然后,对精简数据集执行k均值聚类。我们进一步利用83个切片的模拟数据进行了演示,每个切片包含20 535个像素,等于44 GB的数据,该新方法可以与现有工具结合使用以处理整个器官。提供了实现内存高效PCA算法的MATLAB代码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号