首页> 美国卫生研究院文献>PLoS Computational Biology >Learning from Heterogeneous Data Sources: An Application in Spatial Proteomics
【2h】

Learning from Heterogeneous Data Sources: An Application in Spatial Proteomics

机译:从异构数据源中学习:在空间蛋白质组学中的应用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Sub-cellular localisation of proteins is an essential post-translational regulatory mechanism that can be assayed using high-throughput mass spectrometry (MS). These MS-based spatial proteomics experiments enable us to pinpoint the sub-cellular distribution of thousands of proteins in a specific system under controlled conditions. Recent advances in high-throughput MS methods have yielded a plethora of experimental spatial proteomics data for the cell biology community. Yet, there are many third-party data sources, such as immunofluorescence microscopy or protein annotations and sequences, which represent a rich and vast source of complementary information. We present a unique transfer learning classification framework that utilises a nearest-neighbour or support vector machine system, to integrate heterogeneous data sources to considerably improve on the quantity and quality of sub-cellular protein assignment. We demonstrate the utility of our algorithms through evaluation of five experimental datasets, from four different species in conjunction with four different auxiliary data sources to classify proteins to tens of sub-cellular compartments with high generalisation accuracy. We further apply the method to an experiment on pluripotent mouse embryonic stem cells to classify a set of previously unknown proteins, and validate our findings against a recent high resolution map of the mouse stem cell proteome. The methodology is distributed as part of the open-source Bioconductor pRoloc suite for spatial proteomics data analysis.
机译:蛋白质的亚细胞定位是一种重要的翻译后调控机制,可以使用高通量质谱(MS)进行分析。这些基于质谱的空间蛋白质组学实验使我们能够在受控条件下查明特定系统中成千上万种蛋白质的亚细胞分布。高通量质谱方法的最新进展为细胞生物学界提供了大量的实验性空间蛋白质组学数据。但是,有许多第三方数据源,例如免疫荧光显微镜或蛋白质注释和序列,它们代表了丰富而丰富的补充信息源。我们提出了一个独特的转移学习分类框架,该框架利用最近的邻居或支持向量机系统来集成异构数据源,以显着提高亚细胞蛋白质分配的数量和质量。我们通过评估五个实验数据集展示了我们算法的实用性,这些数据集来自四个不同物种,并结合四个不同辅助数据源,以高泛化精度将蛋白质分类为数十个亚细胞区室。我们进一步将该方法应用于多能小鼠胚胎干细胞的实验,以对一组先前未知的蛋白质进行分类,并针对小鼠干细胞蛋白质组的最新高分辨率图验证了我们的发现。该方法作为用于空间蛋白质组学数据分析的开源Bioconductor pRoloc 套件的一部分进行分发。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号