首页> 外文期刊>Journal of Theoretical Biology >EcmPred: Prediction of extracellular matrix proteins based on random forest with maximum relevance minimum redundancy feature selection
【24h】

EcmPred: Prediction of extracellular matrix proteins based on random forest with maximum relevance minimum redundancy feature selection

机译:EcmPred:基于随机森林的具有最大相关性最小冗余特征选择的细胞外基质蛋白预测

获取原文
获取原文并翻译 | 示例
           

摘要

The extracellular matrix (ECM) is a major component of tissues of multicellular organisms. It consists of secreted macromolecules, mainly polysaccharides and glycoproteins. Malfunctions of ECM proteins lead to severe disorders such as marfan syndrome, osteogenesis imperfecta, numerous chondrodysplasias, and skin diseases. In this work, we report a random forest approach, EcmPred, for the prediction of ECM proteins from protein sequences. EcmPred was trained on a dataset containing 300 ECM and 300 non-ECM and tested on a dataset containing 145 ECM and 4187 non-ECM proteins. EcmPred achieved 83% accuracy on the training and 77% on the test dataset. EcmPred predicted 15 out of 20 experimentally verified ECM proteins. By scanning the entire human proteome, we predicted novel ECM proteins validated with gene ontology and InterPro. The dataset and standalone version of the EcmPred software is available at http://www.inb.uni-luebeck.de/tools-demos/Extracellular_matrix_proteins/EcmPred.
机译:细胞外基质(ECM)是多细胞生物组织的主要组成部分。它由分泌的大分子组成,主要是多糖和糖蛋白。 ECM蛋白功能异常会导致严重疾病,例如马凡氏综合症,成骨不全症,许多软骨发育不良和皮肤病。在这项工作中,我们报告了一种随机森林方法EcmPred,用于从蛋白质序列预测ECM蛋白质。 EcmPred在包含300个ECM和300个非ECM的数据集上进行了训练,并在包含145个ECM和4187个非ECM蛋白的数据集上进行了测试。 EcmPred在训练中达到83%的准确性,在测试数据集上达到77%的准确性。 EcmPred预测了20种经过实验验证的ECM蛋白中的15种。通过扫描整个人类蛋白质组,我们预测了通过基因本体论和InterPro验证的新型ECM蛋白。 EcmPred软件的数据集和独立版本可从http://www.inb.uni-luebeck.de/tools-demos/Extracellular_matrix_proteins/EcmPred获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号