首页> 外文会议>International Conference on Software Engineering and Data Mining >Prediction of the Protein O-glycosylation by Machine Learning and Statistical Characters around the Glycosylation Sites
【24h】

Prediction of the Protein O-glycosylation by Machine Learning and Statistical Characters around the Glycosylation Sites

机译:通过机器学习和糖基化位点周围的统计特征预测蛋白质O-糖基化

获取原文

摘要

O-glycosylation of the mammalian protein is investigated. It is serine or threonine specific, though any consensus sequence is still unknown. We have applied support vector machines (SVM) for the prediction of O-glycosylation sites from various kinds of protein information, aiming to investigate the condition for glycosylation and elucidate the mechanisms. In the present study, first we focus on the distribution of the glycosylation sites. It is observed that many O-glycosylated sites are in clusters of closely spaced glycosylated sites, whereas the other sites are found sparsely or isolated. These two types of crowded and isolated sites might have different glycosylation mechanisms. Therefore, we divide the whole O-glycosylation sites into the crowded and the isolated groups. For each group, SVM is trained to predict the O-glycosylation sites separately. The prediction results of two SVMs have different input information dependency. The results indicate that some motifs are expected for the isolated group, while the interaction between the glycosylated sites and the relative proportion of the surrounding amino acids affect the glycosylation for the crowded group. Then, we compare the statistics of amino acid sequences around the glycosylation sites of both groups. As the results, some amino acids (proline, valine, alanine etc.) have high existence probabilities at each specific position relative to a glycosylation site, especially for the isolated glycosylation. Moreover, independent component analysis for the amino acid sequences elucidates position specific existences of the above amino acids, including well known proline at -1 and +3, which are found as different independent components. Finally, we investigate the relation with O-glycosylation and the domain structure or the disordered region of the protein. It is clearly observed that O-glycosylation is more frequently observed in the disordered region and less in the domain. This could be the key feature to understand the non-conservation property, the role in functional diversity and structural stability of O-glycosylation.
机译:哺乳动物蛋白的O-糖基化进行了研究。它是丝氨酸或苏氨酸特定的,但任何共有序列仍是未知数。我们有从各种信息蛋白O-糖基化位点的预测应用的支持向量机(SVM),旨在探讨糖基化的条件和阐明的机制。在本研究中,我们首先专注于糖基化位点的分布。据观察,许多O-糖基化位点是在紧密间隔的糖基化位点的簇,而其它部位被稀疏地发现或分离的。这两种类型的拥挤和孤立的网站可能有不同的糖基化机制。因此,我们把整个O-糖基化位点引入拥挤和孤立的群体。对于每个组,SVM被训练单独预测O-糖基化位点。 2个支持向量机的预测结果具有不同的输入信息的依赖。结果表明,一些基序预期的隔离组,而糖基化的位点和周围氨基酸的相对比例之间的相互作用影响了拥挤组的糖基化。然后,我们比较围绕这两个群体的糖基化位氨基酸序列的统计信息。作为结果,一些氨基酸(脯氨酸,缬氨酸,丙氨酸等)具有相对于糖基化位每个具体位置在高存在概率,特别是对于分离的糖基化。此外,对于上述氨基酸,包括在公知的脯氨酸-1和3,其被发现为不同的独立部件的氨基酸序列阐明位置特定存在物独立分量分析。最后,我们调查与O-糖基化和区域结构或蛋白质的混晶区域的关系。可清楚地观察到O-糖基化在无序区域更频繁地观察到的,并在域以下。这可能是主要特征,了解非保护性能,在功能多样性和O-糖基化的结构稳定性的作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号