...
首页> 外文期刊>The American Journal of Human Genetics >Fast and Accurate Shared Segment Detection and Relatedness Estimation in Un-phased Genetic Data via TRUFFLE
【24h】

Fast and Accurate Shared Segment Detection and Relatedness Estimation in Un-phased Genetic Data via TRUFFLE

机译:通过松露的未相位遗传数据的快速准确的共享段检测和相关性估计

获取原文
获取原文并翻译 | 示例
           

摘要

Relationship estimation and segment detection between individuals is an important aspect of disease gene mapping. Existing methods are either tailored for computational efficiency or require phasing to improve accuracy. We developed TRUFFLE, a method that integrates computational techniques and statistical principles for the identification and visualization of identity-by-descent (IBD) segments using un-phased data. By skipping the haplotype phasing step and, instead, relying on a simpler region-based approach, our method is computationally efficient while maintaining inferential accuracy. In addition, an error model corrects for segment break-ups that occur as a consequence of genotyping errors. TRUFFLE can estimate relatedness for 3.1 million pairs from the 1000 Genomes Project data in a few minutes on a typical laptop computer. Consistent with expectation, we identified only three second cousin or closer pairs across different populations, while commonly used methods identified a large number of such pairs. Similarly, within populations, we identified many fewer related pairs. Compared to methods relying on phased data, TRUFFLE has comparable accuracy but is drastically faster and has fewer broken segments. We also identified specific local genomic regions that are commonly shared within populations, suggesting selection. When applied to pedigree data, we observed 99.6% accuracy in detecting 1st to 5th degree relationships. As genomic datasets become much larger, TRUFFLE can enable disease gene mapping through implicit shared haplotypes by accurate IBD segment detection.
机译:个体之间的关系估计和分段检测是疾病基因映射的重要方面。现有方法用于计算效率或需要相位以提高准确性。我们开发了Truffle,一种方法,该方法集成了计算技术和统计原理,用于使用未分阶段数据识别和可视化身份逐个(IBD)段的识别和可视化。通过跳过单倍型定向步骤,而是依赖于基于区域的更简单的方法,我们的方法在计算上进行了高效,同时保持了推动精度。此外,对于因基因分型错误而发生的段分段,误差模型纠正。松露可以在典型的笔记本电脑上几分钟内从1000个基因组项目数据中估算310万对的相关性。与期望一致,我们在不同群体中只识别了三个第二个堂兄或更近的对,而常用的方法鉴定了大量的这种成对。同样,在人群中,我们确定了较少的相关对。与依赖于分阶段数据的方法相比,松露具有可比的准确性,但剧烈较快,段较少。我们还确定了普遍在人群内共享的特定局部基因组区域,建议选择。当应用于血统数据时,我们在检测到第1至第5度关系中观察到99.6%的准确性。随着基因组数据集变得更大,松露可以通过精确的IBD段检测使疾病基因通过隐含共享单倍型映射。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号