首页> 美国卫生研究院文献>Plant Physiology >Focus Issue on Plant Databases: Genome Cluster Database. A Sequence Family Analysis Platform for Arabidopsis and Rice
【2h】

Focus Issue on Plant Databases: Genome Cluster Database. A Sequence Family Analysis Platform for Arabidopsis and Rice

机译:关于植物数据库的重点问题:基因组簇数据库。拟南芥和水稻的序列家族分析平台

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database () with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species.
机译:来自拟南芥(Arabidopsis thaliana)和水稻(Oryza sativa)spp的全基因组蛋白序列。利用序列相似性和基于域的聚类将粳稻聚类为科。两种根本不同的方法导致了具有互补属性的单独聚类集,以弥补准确族分析的局限性。通过使用每个簇中最常见的分子功能基因本体论节点的描述的有效计算方法,为识别出的家族的功能名称分配了名称。随后,为组装的科计算了多个比对和系统发育树。所有聚类结果及其潜在序列都通过可访问Web的基因组聚类数据库()进行了组织,该数据库具有丰富的交互式且用户友好的序列族挖掘工具,可促进植物科学界对任何给定感兴趣家族的分析。自动化的聚类流水线可确保当前信息,以便将来在两个基因组的注释中进行更新以及改善聚类。该分析允许对两种生物体以及局限于其中一种生物体中的家族蛋白和单线态蛋白进行首次系统鉴定。此外,用于挖掘这些数据的已建立Web资源为将来研究这两个物种之间蛋白质家族的组成和结构提供了路线图。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号