...
首页> 外文期刊>BMC Bioinformatics >Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks
【24h】

Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks

机译:基因本体注释的自动提取及其与蛋白质网络中簇的相关性

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets. Results We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP) technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO) annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology. An increase in the number and size of GO groups without any noticeable decrease of the link density within the groups indicated that this expansion significantly broadens the public GO annotation without diluting its quality. We revealed that functional GO annotation correlates mostly with clustering in a physical interaction protein network, while its overlap with indirect regulatory network communities is two to three times smaller. Conclusion Protein functional annotations extracted by the NLP technology expand and enrich the existing GO annotation system. The GO functional modularity correlates mostly with the clustering in the physical interaction network, suggesting that the essential role of structural organization maintained by these interactions. Reciprocally, clustering of proteins in physical interaction networks can serve as an evidence for their functional similarity.
机译:背景技术揭示蛋白质的细胞作用是一项极其重要和复杂的任务,需要专门的实验工作以及通常复杂的数据挖掘和处理工具。人们通常将蛋白质功能(通常称为注释)通过蛋白质间相互作用网络的拓扑结构表现出来。特别是,越来越多的证据表明,与具有其他功能的蛋白质相比,具有相同功能的蛋白质更可能相互作用。但是,由于功能注释和蛋白质网络拓扑结构经常分开研究,因此它们之间的直接关系尚未得到全面证明。除了具有一般生物学意义外,这种演示还将进一步验证用于构成蛋白质注释和蛋白质-蛋白质相互作用数据集的数据提取和处理方法。结果我们开发了一种基于自然语言处理(NLP)技术从科学文本中自动提取蛋白质功能注释的方法。对于从整个PubMed提取的蛋白质注释,我们评估了准确性和召回率,并将自动提取技术的性能与公共基因本体(GO)注释中使用的手动管理的性能进行了比较。在我们演讲的第二部分中,我们报告了对基于文献的蛋白质网络中的群落与功能相关蛋白的GO注释组之间的对应关系的大规模调查。我们发现了一个全面的双向匹配:生物学注释组中的蛋白质形成的稠密链接网络簇比偶然预期的要多,相反,紧密链接的网络社区表现出与GO组明显的非随机重叠。我们还使用NLP技术提取的关系扩展了可公开获得的GO生物过程注释。 GO组的数量和大小增加,而组内的链接密度没有任何明显降低,这表明这种扩展显着扩大了公共GO注释的范围,而不会降低其质量。我们发现功能性GO注释主要与物理相互作用蛋白网络中的聚类相关,而与间接调控网络社区的重叠则小两到三倍。结论NLP技术提取的蛋白质功能注释扩展并丰富了现有的GO注释系统。 GO功能模块主要与物理交互网络中的群集相关,这表明这些交互作用保持了结构组织的重要作用。相应地,蛋白质在物理相互作用网络中的聚集可以作为其功能相似性的证据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号