...
首页> 外文期刊>BMC Biology >Insertion variants missing in the human reference genome are widespread among human populations
【24h】

Insertion variants missing in the human reference genome are widespread among human populations

机译:人类人群中缺失的插入变体在人口中普遍存在

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Structural variants comprise diverse genomic arrangements including deletions, insertions, inversions, and translocations, which can generally be detected in humans through sequence comparison to the reference genome. Among structural variants, insertions are the least frequently identified variants, mainly due to ascertainment bias in the reference genome, lack of previous sequence knowledge, and low complexity of typical insertion sequences. Though recent developments in long-read sequencing deliver promise in annotating individual non-reference insertions, population-level catalogues on non-reference insertion variants have not been identified and the possible functional roles of these hidden variants remain elusive. To detect non-reference insertion variants, we developed a pipeline, InserTag, which generates non-reference contigs by local de novo assembly and then infers the full-sequence of insertion variants by tracing contigs from non-human primates and other human genome assemblies. Application of the pipeline to data from 2535 individuals of the 1000 Genomes Project helped identify 1696 non-reference insertion variants and re-classify the variants as retention of ancestral sequences or novel sequence insertions based on the ancestral state. Genotyping of the variants showed that individuals had, on average, 0.92-Mbp sequences missing from the reference genome, 92% of the variants were common (allele frequency??5%) among human populations, and more than half of the variants were major alleles. Among human populations, African populations were the most divergent and had the most non-reference sequences, which was attributed to the greater prevalence of high-frequency insertion variants. The subsets of insertion variants were in high linkage disequilibrium with phenotype-associated SNPs and showed signals of recent continent-specific selection. Non-reference insertion variants represent an important type of genetic variation in the human population, and our developed pipeline, InserTag, provides the frameworks for the detection and genotyping of non-reference sequences missing from human populations.
机译:结构变体包括不同的基因组布置,包括缺失,插入,逆转和易位,其通常可以通过与参考基因组的序列比较来检测在人体中。在结构变体中,插入是最常见的鉴定变体,主要是由于参考基因组中的确定偏差,缺乏先前的序列知识,以及典型的插入序列的低复杂性。虽然最近的长读序列测序的发展,但在注释各个非参考插入时,尚未确定关于非参考插入变体的人口级目录,并且这些隐藏变体的可能功能作用仍然难以捉摸。为了检测非参考插入变体,我们开发了一种管道,它通过本地DE Novo组件产生非参考CONTIG,然后通过从非人类原始化物和其他人类基因组组件追踪斑点来缩短插入变体序列。将管道应用于来自1000个基因组项目的2535个人的数据有助于识别1696个非参考插入变体,并将变体重新分类为基于祖先状态的祖先序列或新颖的序列插入。变体的基因分型显示,平均而言,个体具有从参考基因组中缺失的0.92Mbp序列,92%的变体常见(等位基因频率?>?5%),在人群中,占有超过一半的变体主要等位基因。在人口中,非洲群体是最分歧的,并且具有最多的非参考序列,其归因于高频插入变体的普遍性。插入变体的子集具有高键化不平衡,具有表型相关的SNP,并显示出最近的大陆特异性选择的信号。非参考插入变体代表了人口的重要类型的遗传变异,以及我们发达的管道Insertag,提供了用于人口缺失的非参考序列的检测和基因分型的框架。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号