首页> 美国卫生研究院文献>PLoS Genetics >Repetitive Elements May Comprise Over Two-Thirds of the Human Genome
【2h】

Repetitive Elements May Comprise Over Two-Thirds of the Human Genome

机译:重复元素可能包含超过三分之二的人类基因组

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Transposable elements (TEs) are conventionally identified in eukaryotic genomes by alignment to consensus element sequences. Using this approach, about half of the human genome has been previously identified as TEs and low-complexity repeats. We recently developed a highly sensitive alternative de novo strategy, P-clouds, that instead searches for clusters of high-abundance oligonucleotides that are related in sequence space (oligo “clouds”). We show here that P-clouds predicts >840 Mbp of additional repetitive sequences in the human genome, thus suggesting that 66%–69% of the human genome is repetitive or repeat-derived. To investigate this remarkable difference, we conducted detailed analyses of the ability of both P-clouds and a commonly used conventional approach, RepeatMasker (RM), to detect different sized fragments of the highly abundant human Alu and MIR SINEs. RM can have surprisingly low sensitivity for even moderately long fragments, in contrast to P-clouds, which has good sensitivity down to small fragment sizes (∼25 bp). Although short fragments have a high intrinsic probability of being false positives, we performed a probabilistic annotation that reflects this fact. We further developed “element-specific” P-clouds (ESPs) to identify novel Alu and MIR SINE elements, and using it we identified ∼100 Mb of previously unannotated human elements. ESP estimates of new MIR sequences are in good agreement with RM-based predictions of the amount that RM missed. These results highlight the need for combined, probabilistic genome annotation approaches and suggest that the human genome consists of substantially more repetitive sequence than previously believed.
机译:通常通过与共有元件序列比对在真核基因组中鉴定转座元件(TEs)。使用这种方法,先前已将大约一半的人类基因组鉴定为TE和低复杂度重复序列。最近,我们开发了一种高度敏感的从头开始的替代策略P-clouds,它搜索与序列空间相关的高丰度寡核苷酸簇(寡聚“云”)。我们在这里显示P-clouds预测人类基因组中的其他重复序列> 840 Mbp,因此表明人类基因组中66%–69%是重复的或重复衍生的。为了调查这种显着差异,我们对P云和常用的常规方法RepeatMasker(RM)进行了详细分析,以检测高度丰富的人类Alu和MIR SINE的不同大小的片段。与P云相比,RM甚至对中等长度的片段都具有令人惊讶的低灵敏度,而P云则对小片段大小(约25 bp)具有良好的灵敏度。尽管短片段具有很高的内在可能性,即为假阳性,但我们执行了概率注释,以反映这一事实。我们进一步开发了“元素特定”的P云(ESP),以识别新颖的Alu和MIR SINE元素,并使用它确定了大约100 Mb的先前未注释的人类元素。 ESP对新MIR序列的估计与基于RM的RM遗漏量预测非常吻合。这些结果凸显了对组合的概率基因组注释方法的需求,并表明人类基因组比以前认为的具有更多的重复序列。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号