首页> 外文OA文献 >Application of Singular Spectrum Analysis (SSA), Independent Component Analysis (ICA) and Empirical Mode Decomposition (EMD) for automated solvent suppression and automated baseline and phase correction from multi-dimensional NMR spectra
【2h】

Application of Singular Spectrum Analysis (SSA), Independent Component Analysis (ICA) and Empirical Mode Decomposition (EMD) for automated solvent suppression and automated baseline and phase correction from multi-dimensional NMR spectra

机译:奇异光谱分析(SSA),独立成分分析(ICA)和经验模式分解(EMD)在多维NMR光谱自动抑制溶剂和自动进行基线和相位校正中的应用

摘要

A common problem on protein structure determination by NMR spectroscopy is due to the solvent artifact. Typically, a deuterated solvent is used instead of normal water. However, several experimental methods have been developed to suppress the solvent signal in the case that one has to use a protonated solvent or if the signals of the remaining protons even in a highly deuterated sample are still too strong. For a protein dissolved in 90% H2O / 10% D2O, the concentration of solvent protons is about five orders of magnitude greater than the concentration of the protons of interest in the solute. Therefore, the evaluation of multi-dimensional NMR spectra may be incomplete since certain resonances of interest (e.g. Hα proton resonances) are hidden by the solvent signal and since signal parts of the solvent may be misinterpreted as cross peaks originating from the protein. The experimental solvent suppression procedures typically are not able to recover these significant protein signals. Many post-processing methods have been designed in order to overcome this problem. udIn this work, several algorithms for the suppression of the water signal have been developed and compared. In particular, it has been shown that the Singular Spectrum Analysis (SSA) can be applied advantageously to remove the solvent artifact from NMR spectra of any dimensionality both digitally and analogically acquired. In particular, the investigated time domain signals (FIDs) are decomposed into water and protein related components by means of an initial embedding of the data in the space of time-delayed coordinates. Eigenvalue decomposition is applied on these data and the component with the highest variance (typically represented by the dominant solvent signal) is neglected before reverting the embedding. Pre-processing (group delay management and signal normalization) and post-processing (inverse normalization, Fourier transformation and phase and baseline corrections) of the NMR data is mandatory in order to obtain a better performance of the suppression. The optimal embedding dimension has been empirically determined in accordance to a specific qualitative and quantitative analysis of the extracted components applied on a back-calculated two-dimensional spectrum of HPr protein from Staphylococcus aureus.udMoreover, the investigation of experimental data (three-dimensional 1H13C HCCH-TOCSY spectrum of Trx protein from Plasmodium falciparum and two-dimensional NOESY and TOCSY spectra of HPr protein from Staphylococcus aureus) has revealed the ability of the algorithm to recover resonances hidden underneath the water signal. udPathological diseases and the effects of drugs and lifestyle can be detected from NMR spectroscopy applied on samples containing biofluids (e.g. urine, blood, saliva). The detection of signals of interest in such spectra can be hampered by the solvent as well. The SSA has also been successfully applied to one-dimensional urine, blood and cell spectra.udThe algorithm for automated solvent suppression has been introduced in the AUREMOL software package (AUREMOL_SSA). It is optionally followed by an automated baseline correction in the frequency domain (AUREMOL_ALS) that can be also used out the former algorithm. The automated recognition of baseline points is differently performed in dependence on the dimensionality of the data. udIn order to investigate the limitations of the SSA, it has been applied to spectra whose dominant signal is not the solvent (as in case of watergate solvent suppression and in case of back-calculated data not including any experimental water signal) determining the optimal solvent-to-solute ratio.udThe Independent Component Analysis (ICA) represents a valid alternative for water suppression when the solvent signal is not the dominant one in the spectra (when it is smaller than the half of the strongest solute resonance). In particular, two components are obtained: the solvent and the solute. The ICA needs as input at least as many different spectra (mixtures) as the number of components (source signals), thus the definition of a suitable protocol for generating a dataset of one-dimensional ICA-tailored inputs is straightforward. udThe ICA has revealed to overcome the SSA limitations and to be able to recover resonances of interest that cannot be detected applying the SSA. The ICA avoids all the pre- and post-processing steps, since it is directly applied in the frequency domain. On the other hand, the selection of the component to be removed is automatically detected in the SSA case (having the highest variance). In the ICA, a visual inspection of the extracted components is still required considering that the output is permutable and scale and sign ambiguities may occur. udThe Empirical Mode Decomposition (EMD) has revealed to be more suitable for automated phase correction than for solvent suppression purposes. It decomposes the FID into several intrinsic mode functions (IMFs) whose frequency of oscillation decreases from the first to the last ones (that identifies the solvent signal). The automatically identified non-baseline regions in the Fourier transform of the sum of the first IMFs are separately evaluated and genetic algorithms are applied in order to determine the zero- and first-order terms suitable for an optimal phase correction. udThe SSA and the ALS algorithms have been applied before assigning the two-dimensional NOESY spectrum (with the program KNOWNOE) of the PSCD4-domain of the pleuralin protein in order to increase the number of already existing distance restraints. A new routine to derive 3JHNHα couplings from torsion angles (Karplus relation) and vice versa, has been introduced in the AUREMOL software. Using the newly developed tools a refined three-dimensional structure of the PSCD4-domain could be obtained.
机译:通过NMR光谱确定蛋白质结构的常见问题归因于溶剂伪像。通常,使用氘代溶剂代替普通水。但是,已经开发出了几种实验方法来抑制溶剂信号,这种情况是必须使用质子化的溶剂,或者即使在高度氘化的样品中其余质子的信号仍然太强的情况下。对于溶解在90%H2O / 10%D2O中的蛋白质,溶剂质子的浓度比溶质中目标质子的浓度大大约五个数量级。因此,多维NMR光谱的评估可能是不完整的,因为某些感兴趣的共振(例如Hα质子共振)被溶剂信号隐藏了,并且由于溶剂的信号部分可能被误解为源自蛋白质的交叉峰。实验性溶剂抑制程序通常无法恢复这些重要的蛋白质信号。为了克服这个问题,已经设计了许多后处理方法。 ud在这项工作中,已经开发并比较了几种抑制水信号的算法。特别地,已经显示出奇异光谱分析(SSA)可以有利地应用于从数字和模拟获取的任何维度的NMR光谱中去除溶剂伪影。特别地,通过将​​数据初始嵌入时延坐标空间中,将所研究的时域信号(FID)分解为与水和蛋白质相关的成分。特征值分解应用于这些数据,并且在恢复嵌入之前,忽略具有最大方差(通常由主要溶剂信号表示)的组件。为了获得更好的抑制性能,必须对NMR数据进行预处理(组延迟管理和信号归一化)和后处理(逆归一化,傅立叶变换以及相位和基线校正)。根据对金黄色葡萄球菌HPr蛋白二维反谱计算的提取成分的定性和定量分析,通过经验确定了最佳的包埋尺寸。 ud此外,实验数据的研究(三维恶性疟原虫的Trx蛋白的1H13C HCCH-TOCSY光谱以及金黄色葡萄球菌的HPr蛋白质的二维NOESY和TOCSY光谱显示了该算法具有恢复隐藏在水信号下的共振的能力。可以通过对包含生物流体(例如尿液,血液,唾液)的样品应用NMR光谱法来检测病理疾病以及药物和生活方式的影响。在这种光谱中感兴趣的信号的检测也可能被溶剂阻碍。 SSA也已成功地应用于一维尿液,血液和细胞光谱。 udAUREMOL软件包(AUREMOL_SSA)中引入了自动溶剂抑制算法。可以选择在其后进行频域中的自动基线校正(AUREMOL_ALS),也可以在前一种算法中使用该校正。基准点的自动识别根据数据的维数而不同地执行。 ud为了研究SSA的局限性,已将其应用于主要信号不是溶剂的光谱(例如在水门溶剂抑制的情况下以及在反向计算数据不包括任何实验水信号的情况下)确定最佳光谱。当溶剂信号不是光谱中的主要信号(小于最强溶质共振的一半时)时,独立成分分析(ICA)代表了一种有效的水抑制方法。特别地,获得两种组分:溶剂和溶质。 ICA需要至少输入与分量(源信号)数量一样多的不同光谱(混合物)作为输入,因此,用于生成一维ICA定制输入数据集的合适协议的定义非常简单。 ud ICA揭示了克服SSA的局限性,并能够恢复使用SSA无法检测到的感兴趣的共振。 ICA避免了所有的预处理和后处理步骤,因为它直接应用于频域。另一方面,在SSA情况下(具有最大差异)会自动检测到要删除的组件的选择。在ICA中,考虑到输出是可置换的,并且可能会出现比例和符号歧义,因此仍然需要对提取的组件进行目视检查。 ud显示,经验模态分解(EMD)比溶剂抑制更适合于自动相位校正。它将FID分解为几个固有模式函数(IMF),它们的振荡频率从第一个降至最后一个(确定溶剂信号)。分别评估在第一IMF的总和的傅立叶变换中自动识别的非基线区域,并应用遗传算法以确定适合于最佳相位校正的零阶和一阶项。 在分配胸膜蛋白PSCD4结构域的二维NOESY光谱(使用程序KNOWNOE)之前,已经应用了SSA和ALS算法,以增加已经存在的距离约束的数量。 AUREMOL软件中引入了一种新的例程,该例程可以从扭转角(Karplus关系)得出反3JHNHα耦合,反之亦然。使用新开发的工具,可以获得PSCD4域的改进的三维结构。

著录项

  • 作者

    De Sanctis Silvia;

  • 作者单位
  • 年度 2013
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号