...
首页> 外文期刊>BMC Bioinformatics >MassComp, a lossless compressor for mass spectrometry data
【24h】

MassComp, a lossless compressor for mass spectrometry data

机译:MassComp,一种用于质谱数据的无损压缩机

获取原文

摘要

Mass Spectrometry (MS) is a widely used technique in biology research, and has become key in proteomics and metabolomics analyses. As a result, the amount of MS data has significantly increased in recent years. For example, the MS repository MassIVE contains more than 123TB of data. Somehow surprisingly, these data are stored uncompressed, hence incurring a significant storage cost. Efficient representation of these data is therefore paramount to lessen the burden of storage and facilitate its dissemination. We present MassComp, a lossless compressor optimized for the numerical (m/z)-intensity pairs that account for most of the MS data. We tested MassComp on several MS data and show that it delivers on average a 46% reduction on the size of the numerical data, and up to 89%. These results correspond to an average improvement of more than 27% when compared to the general compressor gzip and of 40% when compared to the state-of-the-art numerical compressor FPC. When tested on entire files retrieved from the MassIVE repository, MassComp achieves on average a 59% size reduction. MassComp is written in C++ and freely available at https://github.com/iochoa/MassComp . The compression performance of MassComp demonstrates its potential to significantly reduce the footprint of MS data, and shows the benefits of designing specialized compression algorithms tailored to MS data. MassComp is an addition to the family of omics compression algorithms designed to lessen the storage burden and facilitate the exchange and dissemination of omics data.
机译:质谱(MS)是一种广泛使用的生物研究技术,并已成为蛋白质组学和代谢组科分析的关键。结果,近年来MS数据的数量显着增加。例如,MS存储库MATHERIVE包含超过123TB的数据。不知何故,令人惊讶的是,这些数据被存储了未压缩,因此产生了显着的存储成本。因此,这些数据的有效表示是最为值以减少储存的负担,并促进其传播。我们呈现MassComp,对于大多数MS数据的数值(M / Z) - 亮度对进行了优化的无损压缩机。我们在几个MS数据上测试了MassComp,并表明它在数值数据的大小上平均减少了46%,高达89%。与通用压缩机GZIP相比,这些结果对应于27%以上的平均改善,并且与最先进的数控压缩机FPC相比,40%。在从大型存储库检索的整个文件上测试时,MassComp平均降低了59%的尺寸。 Masscomp是用C ++编写的,并在Https://github.com/iochoa/masscomp自由提供。 MassComp的压缩性能证明其潜力可显着降低MS数据的占地面积,并显示设计为MS数据量身定制的专用压缩算法的益处。 MassComp是OMICS压缩算法系列的补充,旨在减少存储负担并促进OMICS数据的交换和传播。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号