...
首页> 外文期刊>Genetic epidemiology. >On the association analysis of genome-sequencing data: A spatial clustering approach for partitioning the entire genome into nonoverlapping windows
【24h】

On the association analysis of genome-sequencing data: A spatial clustering approach for partitioning the entire genome into nonoverlapping windows

机译:关于基因组排序数据的关联分析:一种用于将整个基因组分成非传播窗口的空间聚类方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

For the association analysis of whole-genome sequencing (WGS) studies, we propose an efficient and fast spatial-clustering algorithm. Compared to existing analysis approaches for WGS data, that define the tested regions either by sliding or consecutive windows of fixed sizes along variants, a meaningful grouping of nearby variants into consecutive regions has the advantage that, compared to sliding window approaches, the number of tested regions is likely to be smaller. In comparison to consecutive, fixed-window approaches, our approach is likely to group nearby variants together. Given existing biological evidence that disease-associated mutations tend to physically cluster in specific regions along the chromosome, the identification of meaningful groups of nearby located variants could thus lead to a potential power gain for association analysis. Our algorithm defines consecutive genomic regions based on the physical positions of the variants, assuming an inhomogeneous Poisson process and groups together nearby variants. As parameters are estimated locally, the algorithm takes the differing variant density along the chromosome into account and provides locally optimal partitioning of variants into consecutive regions. An R-implementation of the algorithm is provided. We discuss the theoretical advances of our algorithm compared to existing, window-based approaches and show the performance and advantage of our introduced algorithm in a simulation study and by an application to Alzheimer's disease WGS data. Our analysis identifies a region in the ITGB3 gene that potentially harbors disease susceptibility loci for Alzheimer's disease. The region-based association signal of ITGB3 replicates in an independent data set and achieves formally genome-wide significance. Software Implementation: An implementation of the algorithm in R is available at: .
机译:对于全基因组测序(WGS)研究的关联分析,我们提出了一种有效和快速的空间聚类算法。与WGS数据的现有分析方法相比,通过滑动或连续沿着变体滑动或连续窗口来定义测试区域,与滑动窗口方法相比,附近变型的有意义的变型分组的有意义的分组是有利的,与滑动窗口方法相比,测试的数量地区可能更小。与连续的固定窗口方法相比,我们的方法可能将附近的变体组合在一起。鉴于现有的生物证据,即疾病相关突变倾向于沿染色体的特定区域群体簇,因此可以导致附近的附近变体的有意义的识别,从而导致关联分析的潜在功率增益。我们的算法基于变体的物理位置来定义连续的基因组区域,假设不均匀的泊松过程和群体在一起附近变体。随着参数在本地估计,该算法考虑到染色体的不同变体密度,并将局部最佳地分配到连续区域中。提供了算法的R-实现。我们讨论了我们算法与现有,基于窗口的方法相比的理论前进,并显示了我们引入的算法在模拟研究中的性能和优势以及对阿尔茨海默病WGS数据的应用。我们的分析鉴定了ITGB3基因中的一个区域,潜在的患有阿尔茨海默病的疾病易感性基因座。 ITGB3的基于区域的关联信号在独立数据集中复制,并达到正式的基因组的意义。软件实现:r中算法的实现可用于:。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号