首页> 外文会议>International Conference on Information Technology for Manufacturing Systems >Identification and Analysis of Single- and Multiple-region Mitotic Protein Complexes by Grouping Gene Ontology Terms
【24h】

Identification and Analysis of Single- and Multiple-region Mitotic Protein Complexes by Grouping Gene Ontology Terms

机译:通过分组基因本体论术语鉴定和分析单区域和多区丝分裂蛋白复合物

获取原文

摘要

Many mitotic proteins are assembled into protein super complexes in three regions - midbody, centrosome and kinetochore (MCK) - with distinctive roles in modulating the mitosis process. However, more than 16% of the mitotic proteins are in multiple regions. Advance identification of mitotic proteins will be helpful to realize the molecular regulatory mechanisms of this organelle. Few ensemble-classifier methods can solve this problem but these methods often fuse various complementary features. In which, Gene ontology (GO) terms play an important role but the GO-term search space is massive and sparse. This motives this work to present an easily implemented method, namely mMck-GO, by identifying a small number of GO terms with support vector machine (SVM) and k-nearest neighbor (KNN) in predicting single- and multiple-region MCK proteins. The mMck-GO method using a simple grouping scheme based on a SVM classifier assembles the GO terms into several groups according to their numbers of annotated proteins in the training dataset, and then measures which top-grouped GO terms performs the best. A new MCK protein dataset containing 701 (611 single- and 90 multiple-region) is established in this work. None of the MCK proteins has a 25% pair-wise sequence identity with any other proteins in the same region. When performing on this dataset, we find that the GO term with the maximum annotation number annotates 49.2% of the training protein sequences; contrarily, 56.5% of the GO terms annotate single one protein sequence. This shows the sparse character of GO terms and the effectiveness of top-grouped GO terms in distinguishing MCK proteins. Accordingly, a small group of top 134 GO terms is identified and mMck-GO fuses the GO terms with amino acid composition (AAC) as input features to yield and independent-testing accuracies of 71.66% and 69.18%, respectively. Top 30 GO terms contain eight, eight, and 14 GO terms belonging to molecular function, biological process and cellular component branches, respectively. The 14 GO terms in cellular-component ontology in addition to centrosome and kinetochore are reverent to subcellular compartments, microtubule, membrane, and spindle, where GO:0005737 (cytoplasm) is ranked first. The eight GO terms enabling molecular functions comprise GO:0005515 (protein binding), GO:0000166 (nucleotide binding), and GO:0005524 (ATP binding). Most of the eight GO terms in biological-process ontology are reverent to cell cycle, cell division and mitosis but two GO terms, GO:0045449 and GO:0045449, are reverent to regulation of transcription and transport processes, which helps us to clarify the molecular regulatory mechanisms of this organelle. The top-grouped GO terms can be as an indispensable feature set when concerning other feature types to solve multiple-class problems in the investigation of biological functions.
机译:许多有丝分裂蛋白在三个区域中组装成蛋白质超复合物 - 中间体,中心体和Kinetochore(Mck) - 调节有丝分裂过程的独特作用。然而,超过16%的有丝分裂蛋白质在多个区域中。提前鉴定有丝分裂蛋白的有助于实现该细胞器的分子调节机制。很少有限合奏分类方法可以解决这个问题,但这些方法经常融合各种互补功能。其中,基因本体(GO)术语发挥着重要作用,但Go期间搜索空间是巨大的且稀疏的。这项动机通过在预测单个和多区域MCK蛋白中识别支持向量机(SVM)和K最近邻(KNN)的少量GO术语来呈现易于实现的方法,即MMCK-GO。使用基于SVM分类器的简单分组方案的MMCK-GO方法根据训练数据集中的带注释的蛋白质数组合成几个组,然后衡量顶级GO条款的措施最佳。在这项工作中建立了包含701(611单个和90个和90个多区域)的新MCK蛋白质数据集。麦克蛋白没有任何与同一区域中的任何其他蛋白质具有25%的对序列同一性。在执行此数据集时,我们发现具有最大注释号的Go术语注释培训蛋白序列的49.2%;相反,56.5%的GO术语注释单一蛋白质序列。这显示了在区分MCK蛋白的淘汰术语的稀疏性质和顶级GO术语的有效性。因此,鉴定了一小组TOP 134 GO术语,并且MMCK-GO与氨基酸组合物(AAC)融合的GO术语,作为输入特征,分别产生71.66%和69.18%的效果和独立的测试精度。前30名GO术语分别包含属于分子功能,生物过程和细胞分枝的八个,八和14个术语。除了中心组组分和Kinetochore之外,14阶段的术语术后尤其是亚细胞室,微管,膜和主轴,其中GO:0005737(细胞质)排名第一。使分子函数的八个GO术语包括GO:0005515(蛋白质结合),GO:0000166(核苷酸结合),以及GO:0005524(ATP结合)。生物过程本体中的大部分八个术语虔诚地对细胞周期,细胞分裂和有丝分裂,但两个GO条款,GO:0045449和GO:0045449,对转录和运输过程的调节有助于我们澄清该细胞器的分子调节机制。当有关其他特征类型时,最重要的GO条款可以作为一个不可或缺的功能集,以解决生物学功能调查中的多级问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号