首页> 外文期刊>Computational Biology and Bioinformatics, IEEE/ACM Transactions on >CAMS-RS: Clustering Algorithm for Large-Scale Mass Spectrometry Data Using Restricted Search Space and Intelligent Random Sampling
【24h】

CAMS-RS: Clustering Algorithm for Large-Scale Mass Spectrometry Data Using Restricted Search Space and Intelligent Random Sampling

机译:CAMS-RS:使用受限搜索空间和智能随机采样的大规模质谱数据聚类算法

获取原文
获取原文并翻译 | 示例

摘要

High-throughput mass spectrometers can produce massive amounts of redundant data at an astonishing rate with many of them having poor signal-to-noise (S/N) ratio. These low S/N ratio spectra may not get interpreted using conventional spectra-to-database matching techniques. In this paper, we present an efficient algorithm, CAMS-RS (Clustering Algorithm for Mass Spectra using Restricted Space and Sampling) for clustering of raw mass spectrometry data. CAMS-RS utilizes a novel metric (called F-set) that exploits the temporal and spatial patterns to accurately assess similarity between two given spectra. The F-set similarity metric is independent of the retention time and allows clustering of mass spectrometry data from independent LC-MS/MS runs. A novel restricted search space strategy is devised to limit the comparisons of the number of spectra. An intelligent sampling method is executed on individual bins that allow merging of the results to make the final clusters. Our experiments, using experimentally generated data sets, show that the proposed algorithm is able to cluster spectra with high accuracy and is helpful in interpreting low S/N ratio spectra. The CAMS-RS algorithm is highly scalable with increasing number of spectra and our implementation allows clustering of up to a million spectra within minutes.
机译:高通量质谱仪可以惊人的速度生成大量冗余数据,其中许多信噪比(S / N)都很差。这些低信噪比的光谱可能无法使用常规的光谱数据库匹配技术来解释。在本文中,我们提出了一种有效的算法CAMS-RS(使用受限空间和采样的质谱聚类算法)对原始质谱数据进行聚类。 CAMS-RS利用一种新颖的度量(称为F集),该度量利用时间和空间模式来准确评估两个给定光谱之间的相似性。 F集相似性度量标准与保留时间无关,并且可以对来自独立LC-MS / MS运行的质谱数据进行聚类。设计了一种新颖的受限搜索空间策略来限制光谱数量的比较。在单个容器上执行智能采样方法,该方法可以合并结果以构成最终的群集。我们的实验使用实验生成的数据集,表明所提出的算法能够以较高的精度对光谱进行聚类,并且有助于解释低信噪比的光谱。 CAMS-RS算法具有高度可扩展性,可以增加光谱数量,并且我们的实现允许在数分钟内将多达一百万个光谱聚类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号