首页> 外文期刊>Bioinformatics >A new modeling method in feature construction for the HSQC spectra screening problem
【24h】

A new modeling method in feature construction for the HSQC spectra screening problem

机译:HSQC光谱筛选问题特征构建的新建模方法

获取原文
获取原文并翻译 | 示例
           

摘要

Motivation: Large-scale biological analyses produce huge amounts of data. As a consequence, automation in the data analysis process is needed. Sample screening problems in NMR high-throughput protein structure analysis are the typical examples. Especially, screening by protein (HN)-H-1-N-15 heteronuclear single quantum coherence (HSQC) spectra must be done quantitatively by a human expert. One popular solution for this problem is data mining. Machine learning methods can automatically extract rules and achieve high accuracy in prediction when a good quality training dataset is prepared. However, they tend to be a black box and the learned machines suffer the risk of overfitting to the dataset.Results: We propose a model which evaluates HSQC spectra for feature construction. The model calculates similarity between the measured chemical shifts and those of a random coil peak model. We applied our feature construction method for the machine learning discrimination of folded protein HSQC spectra from unfolded ones, and compared our model-based features with those of conventional sequence-based features and image recognition features. The results revealed that our method has sufficient discrimination power and less overfits on training data, as compared to the other methods. In addition, our method succeeded reduction of input data complexity towards further investigation.
机译:动机:大规模生物学分析产生大量数据。结果,在数据分析过程中需要自动化。 NMR高通量蛋白质结构分析中的样品筛选问题就是典型示例。特别是,必须由人类专家定量进行蛋白质(HN)-H-1-N-15异核单量子相干(HSQC)光谱的筛选。解决此问题的一种流行解决方案是数据挖掘。当准备了高质量的训练数据集时,机器学习方法可以自动提取规则并在预测中实现高精度。但是,它们往往是一个黑匣子,学习的机器可能会面临过度拟合数据集的风险。结果:我们提出了一个模型,用于评估用于特征构建的HSQC光谱。该模型计算测得的化学位移与随机线圈峰模型的化学位移之间的相似性。我们将特征构建方法应用于从折叠蛋白质HSQC光谱到未折叠蛋白质的机器学习识别中,并将我们基于模型的特征与常规基于序列的特征和图像识别特征进行了比较。结果表明,与其他方法相比,我们的方法具有足够的判别力,并且对训练数据的拟合度较小。此外,我们的方法成功地降低了输入数据的复杂性,从而可以进一步研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号