首页> 外文期刊>Genes >PClass: Protein Quaternary Structure Classification by Using Bootstrapping Strategy as Model Selection
【24h】

PClass: Protein Quaternary Structure Classification by Using Bootstrapping Strategy as Model Selection

机译:PClass:使用自举策略作为模型选择的蛋白质四级结构分类

获取原文
           

摘要

Protein quaternary structure complex is also known as a multimer, which plays an important role in a cell. The dimer structure of transcription factors is involved in gene regulation, but the trimer structure of virus-infection-associated glycoproteins is related to the human immunodeficiency virus. The classification of the protein quaternary structure complex for the post-genome era of proteomics research will be of great help. Classification systems among protein quaternary structures have not been widely developed. Therefore, we designed the architecture of a two-layer machine learning technique in this study, and developed the classification system PClass. The protein quaternary structure of the complex is divided into five categories, namely, monomer, dimer, trimer, tetramer, and other subunit classes. In the framework of the bootstrap method with a support vector machine, we propose a new model selection method. Each type of complex is classified based on sequences, entropy, and accessible surface area, thereby generating a plurality of feature modules. Subsequently, the optimal model of effectiveness is selected as each kind of complex feature module. In this stage, the optimal performance can reach as high as 70% of Matthews correlation coefficient (MCC). The second layer of construction combines the first-layer module to integrate mechanisms and the use of six machine learning methods to improve the prediction performance. This system can be improved over 10% in MCC. Finally, we analyzed the performance of our classification system using transcription factors in dimer structure and virus-infection-associated glycoprotein in trimer structure. PClass is available via a web interface at http://predictor.nchu.edu.tw/PClass/ .
机译:蛋白质四级结构复合物也称为多聚体,在细胞中起重要作用。转录因子的二聚体结构参与基因调控,但是病毒感染相关糖蛋白的三聚体结构与人类免疫缺陷病毒有关。后蛋白质组学时代的蛋白质四元结构复合体的分类将有很大帮助。蛋白质四级结构之间的分类系统尚未得到广泛开发。因此,我们在本研究中设计了一种两层机器学习技术的体系结构,并开发了分类系统PClass。该复合物的蛋白质四级结构分为五类,即单体,二聚体,三聚体,四聚体和其他亚基类别。在带有支持向量机的引导方法的框架中,我们提出了一种新的模型选择方法。根据序列,熵和可访问表面积对每种类型的复合体进行分类,从而生成多个特征模块。随后,选择有效性的最佳模型作为每种复杂特征模块。在此阶段,最佳性能可以达到Matthews相关系数(MCC)的70%。第二层构造结合了第一层模块以集成机制,并使用六种机器学习方法来提高预测性能。在MCC中,该系统可以提高10%以上。最后,我们使用二聚体结构中的转录因子和三聚体结构中与病毒感染相关的糖蛋白分析了分类系统的性能。可通过Web界面http://predictor.nchu.edu.tw/PClass/获得PClass。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号