首页> 美国卫生研究院文献>other >Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies
【2h】

Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies

机译:在疾病关联研究中使用汉明距离作为SNP集聚类和测试的信息

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The availability of high-throughput genomic data has led to several challenges in recent genetic association studies, including the large number of genetic variants that must be considered and the computational complexity in statistical analyses. Tackling these problems with a marker-set study such as SNP-set analysis can be an efficient solution. To construct SNP-sets, we first propose a clustering algorithm, which employs Hamming distance to measure the similarity between strings of SNP genotypes and evaluates whether the given SNPs or SNP-sets should be clustered. A dendrogram can then be constructed based on such distance measure, and the number of clusters can be determined. With the resulting SNP-sets, we next develop an association test HDAT to examine susceptibility to the disease of interest. This proposed test assesses, based on Hamming distance, whether the similarity between a diseased and a normal individual differs from the similarity between two individuals of the same disease status. In our proposed methodology, only genotype information is needed. No inference of haplotypes is required, and SNPs under consideration do not need to locate in nearby regions. The proposed clustering algorithm and association test are illustrated with applications and simulation studies. As compared with other existing methods, the clustering algorithm is faster and better at identifying sets containing SNPs exerting a similar effect. In addition, the simulation studies demonstrated that the proposed test works well for SNP-sets containing a large proportion of neutral SNPs. Furthermore, employing the clustering algorithm before testing a large set of data improves the knowledge in confining the genetic regions for susceptible genetic markers.
机译:高通量基因组数据的可用性在最近的遗传关联研究中引发了一些挑战,包括必须考虑的大量遗传变异以及统计分析的计算复杂性。通过标记集研究(例如SNP集分析)解决这些问题可能是一种有效的解决方案。为了构建SNP集,我们首先提出一种聚类算法,该算法使用汉明距离来测量SNP基因型字符串之间的相似性,并评估是否应该对给定的SNP或SNP集进行聚类。然后可以基于这种距离度量来构建树状图,并且可以确定簇的数量。利用生成的SNP集,我们接下来开发一种关联测试HDAT,以检查对目标疾病的敏感性。该提议的测试基于汉明距离来评估患病个体与正常个体之间的相似性是否不同于具有相同疾病状态的两个个体之间的相似性。在我们提出的方法中,仅需要基因型信息。无需推断单倍型,所考虑的SNP无需位于附近区域。通过应用和仿真研究说明了所提出的聚类算法和关联测试。与其他现有方法相比,该聚类算法可以更快,更好地识别包含具有相似效果的SNP的集合。此外,仿真研究表明,所提出的测试对于包含大量中性SNP的SNP集效果很好。此外,在测试大量数据之前采用聚类算法可以改善将遗传区域限定在易感遗传标记中的知识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号