首页> 美国卫生研究院文献>Zebrafish >Pseudo-De Novo Assembly and Analysis of Unmapped Genome Sequence Reads in Wild Zebrafish Reveal Novel Gene Content
【2h】

Pseudo-De Novo Assembly and Analysis of Unmapped Genome Sequence Reads in Wild Zebrafish Reveal Novel Gene Content

机译:伪De Novo大会和野生的斑马鱼中未映射的基因组序列读取的分析揭示了新的基因含量。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Zebrafish represents the third vertebrate with an officially completed genome, yet it remains incomplete with additions and corrections continuing with the current release, GRCz10, having 13% of zebrafish cDNA sequences unmapped. This disparity may result from population differences, given that the genome reference was generated from clonal individuals with limited genetic diversity. This is supported by the recent analysis of a single wild zebrafish, which identified over 5.2 million SNPs and 1.6 million in/dels in the previous genome build, zv9. Re-examination of this sequence data set indicated that 13.8% of quality sequence reads failed to align to GRCz10. Using a novel bioinformatics de novo assembly pipeline on these unmappable reads, we identified 1,514,491 novel contigs covering ∼224 Mb of genomic sequence. Among these, 1083 contigs were found to contain a potential gene coding sequence. RNA-seq data comparison confirmed that 362 contigs contained a transcribed DNA sequence, suggesting that a large amount of functional genomic sequence remains unannotated in the zebrafish reference genome. By utilizing the bioinformatics pipeline developed in this study, the zebrafish genome will be bolstered as a model for human disease research. Adaptation of the pipeline described here also offers a cost-efficient and effective method to identify and map novel genetic content across any genome and will ultimately aid in the completion of additional genomes for a broad range of species.
机译:斑马鱼代表了第三只脊椎动物,其基因组已正式完成,但由于当前版本GRCz10的添加和校正,其仍然不完整,其中有13%的斑马鱼cDNA序列未映射。鉴于基因组参考是从遗传多样性有限的克隆个体中产生的,因此这种差异可能是由于种群差异造成的。最近对单个野生斑马鱼的分析支持了这一点,该斑马鱼在以前的基因组构建zv9中鉴定出超过520万个SNP和160万个in / dels。重新检查此序列数据集表明,质量序列读取的13.8%未能与GRCz10进行比对。在这些无法定位的序列上使用新颖的生物信息学从头组装流水线,我们鉴定了1,514,491个新颖的重叠群,覆盖了〜224 Mb的基因组序列。其中,发现1083个重叠群含有潜在的基因编码序列。 RNA-seq数据比较证实了362个重叠群含有转录的DNA序列,这表明在斑马鱼参考基因组中仍未注释大量功能基因组序列。通过利用这项研究中开发的生物信息学管道,斑马鱼基因组将成为人类疾病研究的模型。本文所述管道的改编也提供了一种经济有效的方法,可在任何基因组中鉴定和定位新的遗传内容,并最终有助于完成广泛物种的其他基因组的完成。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号