...
首页> 外文期刊>Nucleic Acids Research >TypeTE: a tool to genotype mobile element insertions from whole genome resequencing data
【24h】

TypeTE: a tool to genotype mobile element insertions from whole genome resequencing data

机译:类型:来自全基因组的基因型移动元素插入的工具重新开始数据

获取原文
获取原文并翻译 | 示例
           

摘要

Alu retrotransposons account for more than 10% of the human genome, and insertions of these elements create structural variants segregating in human populations. Such polymorphic Alus are powerful markers to understand population structure, and they represent variants that can greatly impact genome function, including gene expression. Accurate genotyping of Alus and other mobile elements has been challenging. Indeed, we found that Alu genotypes previously called for the 1000 Genomes Project are sometimes erroneous, which poses significant problems for phasing these insertions with other variants that comprise the haplotype. To ameliorate this issue, we introduce a new pipeline - TypeTE - which genotypes Alu insertions from whole-genome sequencing data. Starting from a list of polymorphic A/us, TypeTE identifies the hallmarks (poly-A tail and target site duplication) and orientation of Alu insertions using local re-assembly to reconstruct presence and absence alleles. Genotype likelihoods are then computed after re-mapping sequencing reads to the reconstructed alleles. Using a high-quality set of PCR-based genotyping of >200 loci, we show that TypeTE improves genotype accuracy from 83% to 92% in the 1000 Genomes dataset. TypeTE can be readily adapted to other retrotransposon families and brings a valuable toolbox addition for population genomics.
机译:Alu Retrotransposons占人类基因组的10%以上,并且这些元素的插入创造了在人口中分离的结构变体。这种多晶族化合物是了解人口结构的强大标记,并且它们代表了可以大大影响基因组功能的变体,包括基因表达。 Alus和其他移动元素的准确基因分型一直在具有挑战性。实际上,我们发现先前称为1000个基因组项目的ALU基因型有时是错误的,这在与包含单倍型的其他变体逐相阶段占据这些插入的显着问题。为了改善这个问题,我们介绍了一种新的管道 - 类型 - 从全基因组测序数据中插入哪种基因型Alu插入。从列表起始多晶型A /我们,TypeTE识别标志(聚A尾和靶位点重复),并使用当地重新组装以重构存在和不存在的等位基因的Alu插入的方向。然后在重新映射测序读取到重建的等位基因之后计算基因型可能性。使用高质量的PCR基因分型> 200个基因座,我们表明,在1000个基因组数据集中的83%至92%的基因型精度提高了基因型精度。打字机可以容易地适应其他回复朗膦源家族,并为群体基因组学提供有价值的工具箱。

著录项

  • 来源
    《Nucleic Acids Research》 |2020年第6期|共13页
  • 作者单位

    Cornell Univ Dept Mol Biol &

    Genet 215 Tower Rd Ithaca NY 14853 USA;

    Univ Utah Dept Human Genet Sch Med Salt Lake City UT 84112 USA;

    Johns Hopkins Univ Dept Pathol Sch Med Baltimore MD 21205 USA;

    Univ Michigan Dept Human Genet Med Sch Ann Arbor MI 48109 USA;

    Univ Utah Dept Human Genet Sch Med Salt Lake City UT 84112 USA;

    Univ Utah Dept Human Genet Sch Med Salt Lake City UT 84112 USA;

    Johns Hopkins Univ Dept Pathol Sch Med Baltimore MD 21205 USA;

    Univ Utah Dept Human Genet Sch Med Salt Lake City UT 84112 USA;

    Cornell Univ Dept Mol Biol &

    Genet 215 Tower Rd Ithaca NY 14853 USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生物化学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号