...
首页> 外文期刊>Human Heredity >Single-variant and multi-variant trend tests for genetic association with next-generation sequencing that are robust to sequencing error
【24h】

Single-variant and multi-variant trend tests for genetic association with next-generation sequencing that are robust to sequencing error

机译:单变量和多变量趋势测试,可与下一代测序进行遗传关联,从而对测序错误具有鲁棒性

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

As with any new technology, next-generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to those data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have lower power than the corresponding single-variant simulation results, most probably due to our specification of multi-variant SNP correlation values. In conclusion, our LTTae,NGS addresses two key challenges with NGS disease studies; first, it allows for differential misclassification when computing the statistic; and second, it addresses the multiple-testing issue in that there is a multi-variant form of the statistic that has only one degree of freedom, and provides a single p value, no matter how many loci.
机译:与任何新技术一样,下一代测序(NGS)具有潜在的优势和潜在的挑战。一个优势是可以确定SNP芯片技术可能遗漏的多种疾病病因。一个潜在的挑战是分类错误(与任何新兴技术一样)以及由于多次测试而导致的功率损耗问题。在这里,我们开发了一种用于关联的线性趋势测试的扩展,该扩展包含了差分错误分类错误,并且可以应用于任何数量的SNP。我们称该统计为允许误差的线性趋势检验,适用于NGS或LTTae,NGS。此统计信息允许进行分类错误分类。观察到的数据是不相关病例和对照的表型,覆盖率以及所有SNP上每个人的假定因果变体数。我们模拟考虑多种因素(遗传的疾病模式,基因型相对风险,因果变异频率,病例中的序列错误率,对照中的序列错误率,基因座数量等)的数据,并对每种载体评估I型错误率和功效因子设置。我们将我们的结果与两个最近发布的NGS统计数据进行比较。此外,我们基于下载的1000个基因组数据(5个SNP和388个个体)创建了一个虚拟的疾病模型,并将统计数据应用于这些数据。我们发现,LTTAe,NGS在所有模拟中均保持正确的I型错误率(微分和非微分误差),而其他统计数据显示I型错误的通货膨胀率较高,覆盖率较低。在存在非微分误差的情况下,所有三种统计方法的功效几乎都相同。将我们的统计数据应用于1000个基因组数据表明,对于下载的数据,所有SNP的序列错误分类率为1.5%。最后,LTTae,NGS的多变量形式的应用在许多仿真设置中显示出高功率,尽管它的功率可能低于相应的单变量仿真结果,这很可能是由于我们对多变量SNP相关性的规范价值观。总之,我们的LTTae,NGS解决了NGS疾病研究中的两个关键挑战;首先,它允许在计算统计信息时进行分类错误分类;其次,它解决了多重检验问题,因为统计数据的多变量形式只有一个自由度,并且无论有多少个基因座都提供一个p值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号