首页> 外文期刊>Journal of computational science >A time series approach for clustering mass spectrometry data*
【24h】

A time series approach for clustering mass spectrometry data*

机译:用于质谱数据聚类的时间序列方法*

获取原文
获取原文并翻译 | 示例
           

摘要

Advanced statistical techniques and data mining methods have been recognized as a powerful support for mass spectrometry (MS) data analysis. Particularly, due to its unsupervised learning nature, data clustering methods have attracted increasing interest for exploring, identifying, and discriminating pathological cases from MS clinical samples. Supporting biomarker discovery in protein profiles has drawn special attention from biologists and clinicians. However, the huge amount of information contained in a single sample, that is, the high-dimensionality of MS data makes the effective identification of biomarkers a challenging problem. In this paper, we present a data mining approach for the analysis of MS data, in which the mining phase is developed as a task of clustering of MS data. Under the natural assumption of modeling MS data as time series, we propose a new representation model of MS data which allows for significantly reducing the high-dimensionality of such data, while preserving the relevant features. Besides the reduction of high-dimensionality (which typically affects effectiveness and efficiency of computational methods), the proposed representation model of MS data also alleviates the critical task of preprocessing the raw spectra in the whole process of MS data analysis. We evaluated our MS data clustering approach to publicly available proteomic datasets, and experimental results have shown the effectiveness of the proposed approach that can be used to aid clinicians in studying and formulating diagnosis of pathological states.
机译:先进的统计技术和数据挖掘方法已被公认为是质谱(MS)数据分析的有力支持。特别地,由于其无监督的学习性质,数据聚类方法已引起人们越来越多的兴趣,以探索,识别和区分MS临床样品中的病理病例。在蛋白质谱中支持生物标志物发现引起了生物学家和临床医生的特别关注。但是,单个样本中包含的大量信息,即MS数据的高维性,使得有效识别生物标志物成为一个具有挑战性的问题。在本文中,我们提出了一种用于MS数据分析的数据挖掘方法,其中将挖掘阶段开发为MS数据聚类的任务。在将MS数据建模为时间序列的自然假设下,我们提出了一种MS数据的新表示模型,该模型可以显着降低此类数据的高维性,同时保留相关特征。除了减少高维数(通常会影响计算方法的有效性和效率)外,所提出的MS数据表示模型还减轻了MS数据分析全过程中对原始光谱进行预处理的关键任务。我们评估了我们的MS数据聚类方法对公开可用的蛋白质组学数据集,实验结果表明了该方法的有效性,该方法可用于帮助临床医生研究和制定病理状态诊断。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号