首页> 外文期刊>Human Genetics >Comprehensively identifying and characterizing the missing gene sequences in human reference genome with integrated analytic approaches
【24h】

Comprehensively identifying and characterizing the missing gene sequences in human reference genome with integrated analytic approaches

机译:使用集成分析方法全面鉴定和鉴定人类参考基因组中缺失的基因序列

获取原文
获取原文并翻译 | 示例
       

摘要

The human reference genome is still incomplete and a number of gene sequences are missing from it. The approaches to uncover them, the reasons causing their absence and their functions are less explored. Here, we comprehensively identified and characterized the missing genes of human reference genome with RNA-Seq data from 16 different human tissues. By using a combined approach of genome-guided transcriptome reconstruction coupled with genome-wide comparison, we uncovered 3.78 and 2.37 Mb transcribed regions in the human genome assemblies of Celera and HuRef either missed from their homologous chromosomes of NCBI human reference genome build 37.2 or partially or entirely absent from the reference. We further identified a significant number of novel transcript contigs in each tissue from de novo transcriptome assembly that are unalignable to NCBI build 37.2 but can be aligned to at least one of the genomes from Celera, HuRef, chimpanzee, macaca or mouse. Our analyses indicate that the missing genes could result from genome misassembly, transposition, copy number variation, translocation and other structural variations. Moreover, our results further suggest that a large portion of these missing genes are conserved between human and other mammals, implying their important biological functions. Totally, 1,233 functional protein domains were detected in these missing genes. Collectively, our study not only provides approaches for uncovering the missing genes of a genome, but also proposes the potential reasons causing genes missed from the genome and highlights the importance of uncovering the missing genes of incomplete genomes.
机译:人类参考基因组仍然不完整,并且缺少许多基因序列。发现它们的方法,导致它们不存在的原因及其功能的探讨较少。在这里,我们使用来自16种不同人体组织的RNA-Seq数据全面鉴定并鉴定了人类参考基因组的缺失基因。通过使用基因组指导的转录组重建和基因组范围内比较的组合方法,我们在Celera和HuRef的人类基因组装配中发现了3.78和2.37 Mb转录区域,它们从NCBI人类参考基因组的同源染色体构建37.2或部分缺失或完全没有参考文献。我们进一步从从头转录组大会的每个组织中鉴定出大量新的转录重叠群,它们与NCBI build 37.2不相容,但可以与Celera,HuRef,黑猩猩,猕猴或​​小鼠的至少一个基因组比对。我们的分析表明,缺失的基因可能是由于基因组错配,转座,拷贝数变异,易位和其他结构变异引起的。而且,我们的结果进一步表明,这些缺失基因的很大一部分在人类和其他哺乳动物之间是保守的,这暗示了它们的重要生物学功能。总共在这些缺失的基因中检测到1,233个功能性蛋白结构域。总的来说,我们的研究不仅提供了揭示基因组缺失基因的方法,而且还提出了导致基因从基因组缺失的潜在原因,并强调了揭示不完整基因组缺失基因的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号