首页> 外文期刊>Plant and Cell Physiology >OrchidBase: A Collection of Sequences of the Transcriptome Derived from Orchids
【24h】

OrchidBase: A Collection of Sequences of the Transcriptome Derived from Orchids

机译:OrchidBase:源自兰花的转录组序列的集合。

获取原文
获取原文并翻译 | 示例
           

摘要

Orchids are one of the most ecological and evolutionarily significant plants, and the Orchidaceae is one of the most abundant families of the angiosperms. Genetic databases will be useful not only for gene discovery but also for future genomic annotation. For this purpose, OrchidBase was established from 37,979,342 sequence reads collected from 11 in-house Phalaenopsis orchid cDNA libraries. Among them, 41,310 expressed sequence tags (ESTs) were obtained by using Sanger sequencing, whereas 37,908,032 reads were obtained by using next-generation sequencing (NGS) including both Roche 454 and Solexa Illumina sequencers. These reads were assembled into 8,501 contigs and 76,116 singletons, resulting in 84,617 non-redundant transcribed sequences with an average length of 459 bp. The analysis pipeline of the database is an automated system written in Perl and C#, and consists of the following components: automatic pre-processing of EST reads, assembly of raw sequences, annotation of the assembled sequences and storage of the analyzed information in SQL databases. A web application was implemented with HTML and a Microsoft .NET Framework C# program for browsing and querying the database, creating dynamic web pages on the client side, analyzing gene ontology (GO) and mapping annotated enzymes to KEGG pathways. The online resources for putative annotation can be searched either by text or by using BLAST, and the results can be explored on the website and downloaded. Consequently, the establishment of OrchidBase will provide researchers with a high-quality genetic resource for data mining and facilitate efficient experimental studies on orchid biology and biotechnology. The OrchidBase database is freely available at http://lab.fhes.tn.edu.tw/est.
机译:兰花是最生态和具有进化意义的植物之一,而兰花科是被子植物中最丰富的科之一。遗传数据库将不仅对基因发现有用,而且对将来的基因组注释也将有用。为此,从从11个内部蝴蝶兰兰花cDNA文库中收集的37,979,342个序列读数中建立了OrchidBase。其中,通过Sanger测序获得了41,310个表达的序列标签(EST),而通过同时使用Roche 454和Solexa Illumina测序仪的下一代测序(NGS)获得了37,908,032个读数。这些读段被组装成8,501个重叠群和76,116个单例,从而产生84,617个非冗余转录序列,平均长度为459 bp。数据库的分析管道是用Perl和C#编写的自动化系统,由以下组件组成:EST读取的自动预处理,原始序列的组装,组装后的序列的注释以及所分析信息在SQL数据库中的存储。使用HTML和Microsoft .NET Framework C#程序实现了一个Web应用程序,用于浏览和查询数据库,在客户端创建动态网页,分析基因本体(GO)并将带注释的酶映射到KEGG途径。可以通过文本或使用BLAST来搜索用于假定注释的在线资源,并且可以在网站上浏览并下载结果。因此,OrchidBase的建立将为研究人员提供用于数据挖掘的高质量遗传资源,并促进对兰花生物学和生物技术的高效实验研究。 OrchidBase数据库可从http://lab.fhes.tn.edu.tw/est免费获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号