...
首页> 外文期刊>Journal of computational biology: A journal of computational molecular cell biology >The Clark Phaseable Sample Size Problem: Long-Range Phasing and Loss of Heterozygosity in GWAS
【24h】

The Clark Phaseable Sample Size Problem: Long-Range Phasing and Loss of Heterozygosity in GWAS

机译:Clark分阶段的样本量问题:GWAS的远程定相和杂合性损失

获取原文
获取原文并翻译 | 示例

摘要

A phase transition is taking place today. The amount of data generated by genome resequencing technologies is so large that in some cases it is now less expensive to repeat the experiment than to store the information generated by the experiment. In the next few years, it is quite possible that millions of Americans will have been genotyped. The question then arises of how to make the best use of this information and jointly estimate the haplotypes of all these individuals. The premise of this article is that long shared genomic regions (or tracts) are unlikely unless the haplotypes are identical by descent. These tracts can be used as input for a Clark-like phasing method to obtain a phasing solution of the sample. We show on simulated data that the algorithm will get an almost perfect solution if the number of individuals being genotyped is large enough and the correctness of the algorithm grows with the number of individuals being genotyped. We also study a related problem that connects copy number variation with phasing algorithm success. A loss of heterozygosity (LOH) event is when, by the laws of Mendelian inheritance, an individual should be heterozygote but, due to a deletion polymorphism, is not. Such polymorphisms are difficult to detect using existing algorithms, but play an important role in the genetics of disease and will confuse haplotype phasing algorithms if not accounted for. We will present an algorithm for detecting LOH regions across the genomes of thousands of individuals. The design of the long-range phasing algorithm and the loss of heterozygosity inference algorithms was inspired by our analysis of the Multiple Sclerosis (MS) GWAS dataset of the International Multiple Sclerosis Genetics Consortium. We present similar results to those obtained from the MS data.
机译:今天正在进行阶段过渡。基因组重测序技术产生的数据量如此之大,以至于在某些情况下,现在重复进行实验要比存储实验产生的信息便宜。在接下来的几年中,很可能会有成千上万的美国人进行基因分型。于是就出现了一个问题,即如何充分利用这些信息并共同估算所有这些个体的单倍型。本文的前提是,除非单倍型的血统相同,否则长期共享的基因组区域(或片段)的可能性不大。这些区域可用作克拉克样定相方法的输入,以获得样品的定相溶液。我们在模拟数据上表明,如果被基因分型的个体数量足够大,并且该算法的正确性随被基因分型的个体数量增长,则该算法将获得几乎完美的解决方案。我们还研究了一个相关的问题,该问题将拷贝数变异与定相算法的成功联系起来。杂合子丢失(LOH)事件是指根据孟德尔遗传定律,个体应为杂合子,但由于缺失多态性而并非如此。使用现有算法很难检测到这种多态性,但是在疾病的遗传学中起着重要作用,如果不加以考虑,将会使单倍型定相算法感到困惑。我们将提出一种算法,用于检测成千上万个人基因组中的LOH区域。我们对国际多发性硬化症遗传学联盟的多发性硬化症(MS)GWAS数据集的分析启发了长距离定相算法的设计和杂合性推断算法的缺失。我们提供与从MS数据获得的结果相似的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号