首页> 美国卫生研究院文献>G3: GenesGenomesGenetics >Conflation of Short Identity-by-Descent Segments Bias Their Inferred Length Distribution
【2h】

Conflation of Short Identity-by-Descent Segments Bias Their Inferred Length Distribution

机译:短的按身份识别段的归类偏差其推断的长度分布

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Identity-by-descent (IBD) is a fundamental concept in genetics with many applications. In a common definition, two haplotypes are said to share an IBD segment if that segment is inherited from a recent shared common ancestor without intervening recombination. Segments several cM long can be efficiently detected by a number of algorithms using high-density SNP array data from a population sample, and there are currently efforts to detect shorter segments from sequencing. Here, we study a problem of identifiability: because existing approaches detect IBD based on contiguous segments of identity-by-state, inferred long segments of IBD may arise from the conflation of smaller, nearby IBD segments. We quantified this effect using coalescent simulations, finding that significant proportions of inferred segments 1–2 cM long are results of conflations of two or more shorter segments, each at least 0.2 cM or longer, under demographic scenarios typical for modern humans for all programs tested. The impact of such conflation is much smaller for longer (> 2 cM) segments. This biases the inferred IBD segment length distribution, and so can affect downstream inferences that depend on the assumption that each segment of IBD derives from a single common ancestor. As an example, we present and analyze an estimator of the de novo mutation rate using IBD segments, and demonstrate that unmodeled conflation leads to underestimates of the ages of the common ancestors on these segments, and hence a significant overestimate of the mutation rate. Understanding the conflation effect in detail will make its correction in future methods more tractable.
机译:后裔身份(IBD)是遗传学中的一个基本概念,具有许多应用。在一个常见的定义中,如果两个单元型是从最近共享的共同祖先继承而没有重组的话,则称这两个单元型共享一个IBD片段。使用人口样本中的高密度SNP阵列数据,可以通过多种算法有效地检测出几cM长的片段,目前正在努力从测序中检测出较短的片段。在这里,我们研究一个可识别性问题:由于现有方法基于状态标识的连续段来检测IBD,因此推断出的IBD较长段可能是由较小的附近IBD段的合并产生的。我们使用合并模拟对这种影响进行了量化,发现在现代人类典型的所有测试程序的典型人口统计学场景下,推断出的1至2 cM长的片段中有很大比例是两个或更多个更短片段(每个片段至少为0.2 cM或更长时间)融合的结果。对于更长的段(> 2 cM),这种合并的影响要小得多。这会使推断的IBD片段长度分布产生偏差,因此可能会影响下游推断,这些推断取决于以下假设:IBD的每个片段均源自单个公共祖先。例如,我们使用IBD片段介绍和分析从头突变率的估计值,并证明未建模的合并会导致低估这些片段上共同祖先的年龄,因此大大高估了突变率。详细了解合并效果将使其在将来的方法中更易于纠正。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号