...
首页> 外文期刊>Current Plant Biology >SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa ☆
【24h】

SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa ☆

机译:SNP-Seek II:一种用于在水稻中进行等位基因挖掘和大基因组数据分析的资源☆

获取原文
   

获取外文期刊封面封底 >>

       

摘要

The 3000 Rice Genomes Project generated a large dataset of genomic variation to the world’s most important crop, Oryza sativa L. Using the Burrows-Wheeler Aligner (BWA) and the Genome Analysis Toolkit (GATK) variant calling on this dataset, we identified ~40?M single-nucleotide polymorphisms (SNPs). Five reference genomes of rice representing the major variety groups were used: Nipponbare (temperate japonica ), IR 64 ( indica ), 93–11 ( indica ), DJ 123 ( aus ), and Kasalath ( aus ).The results are accessible through the Rice SNP-Seek Database ( http://snp-seek.irri.org ) and through web services of the application programming interface (API). We incorporated legacy phenotypic and passport data for the sequenced varieties originating from the International Rice Genebank Collection Information System (IRGCIS) and gene models from several rice annotation projects. The massive genotypic data in SNP-Seek are stored using hierarchical data format 5 (HDF5) files for quick retrieval. Germplasm, phenotypic, and genomic data are stored in a relational database management system (RDBMS) using the Chado schema, allowing the use of controlled vocabularies from biological ontologies as query constraints in SNP-Seek.In this paper, we discuss the datasets stored in SNP-Seek, architecture of the database and web application, interoperability methodologies in place, and discuss a few use cases demonstrating the utility of SNP-Seek for diversity analysis and molecular breeding.
机译:3000水稻基因组计划为全球最重要的农作物Oryza sativa L生成了大量的基因组变异数据集。使用Burrows-Wheeler Aligner(BWA)和Genome Analysis Toolkit(GATK)变异株,我们对该数据集进行了鉴定,得出约40 ΔM单核苷酸多态性(SNP)。使用了五个代表主要品种组的水稻参考基因组:日本晴(温带粳稻),IR 64(印度),93-11(印度),DJ 123(澳大利亚)和Kasalath(澳大利亚)。 Rice SNP搜寻数据库(http://snp-seek.irri.org)并通过Web服务的应用程序编程接口(API)。我们纳入了国际水稻基因库收集信息系统(IRGCIS)的序列化品种的遗留表型和护照数据,以及几个水稻注释项目的基因模型。使用分层数据格式5(HDF5)文件存储SNP-Seek中的大量基因型数据,以便快速检索。种质,表型和基因组数据使用Chado模式存储在关系数据库管理系统(RDBMS)中,从而允许将来自生物本体的受控词汇用作SNP-Seek中的查询约束。在本文中,我们讨论了存储在SNP-Seek,数据库和Web应用程序的体系结构,可互操作性的方法论到位,并讨论了一些用例,证明了SNP-Seek在多样性分析和分子育种中的实用性。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号