首页> 美国卫生研究院文献>Molecular Cellular Proteomics : MCP >Numerical Compression Schemes for Proteomics Mass Spectrometry Data
【2h】

Numerical Compression Schemes for Proteomics Mass Spectrometry Data

机译:蛋白质组学质谱数据的数值压缩方案

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The open XML format mzML, used for representation of MS data, is pivotal for the development of platform-independent MS analysis software. Although conversion from vendor formats to mzML must take place on a platform on which the vendor libraries are available (i.e. Windows), once mzML files have been generated, they can be used on any platform. However, the mzML format has turned out to be less efficient than vendor formats. In many cases, the naïve mzML representation is fourfold or even up to 18-fold larger compared with the original vendor file. In disk I/O limited setups, a larger data file also leads to longer processing times, which is a problem given the data production rates of modern mass spectrometers. In an attempt to reduce this problem, we here present a family of numerical compression algorithms called MS-Numpress, intended for efficient compression of MS data. To facilitate ease of adoption, the algorithms target the binary data in the mzML standard, and support in main proteomics tools is already available. Using a test set of 10 representative MS data files we demonstrate typical file size decreases of 90% when combined with traditional compression, as well as read time decreases of up to 50%. It is envisaged that these improvements will be beneficial for data handling within the MS community.
机译:用于表示MS数据的开放XML格式mzML对于开发独立于平台的MS分析软件至关重要。从供应商格式到mzML的转换必须在可使用供应商库的平台(即Windows)上进行,但是一旦生成了mzML文件,就可以在任何平台上使用它们。但是,事实证明mzML格式的效率不如供应商格式。在许多情况下,朴素的mzML表示量是原始供应商文件的四倍,甚至多达18倍。在磁盘I / O受限的设置中,较大的数据文件还会导致更长的处理时间,考虑到现代质谱仪的数据生产率,这是一个问题。为了减少这个问题,我们在这里提出了一种称为MS-Numpress的数值压缩算法,旨在有效压缩MS数据。为了便于采用,该算法以mzML标准中的二进制数据为目标,并且已经有主要蛋白质组学工具的支持。使用包含10个代表性MS数据文件的测试集,我们证明了与传统压缩结合使用时,典型文件大小减少了90%,读取时间最多减少了50%。可以设想,这些改进将有利于MS社区内的数据处理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号