...
首页> 外文期刊>Journal of Bioinformatics and Computational Biology >Population clustering based on copy number variations detected from next generation sequencing data
【24h】

Population clustering based on copy number variations detected from next generation sequencing data

机译:基于从下一代测序数据中检测到的拷贝数变异的群体聚类

获取原文
获取原文并翻译 | 示例
           

摘要

Copy number variations (CNVs) can be used as significant biomarkers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering.
机译:拷贝数变异(CNV)可用作重要的生物标志物,下一代测序(NGS)提供了这些CNV的高分辨率检测。但是,如何从CNV中提取特征并将其进一步应用于诸如种群聚类的基因组研究已成为一个巨大的挑战。在本文中,我们提出了一种基于NGS中CNV的人口聚类的新方法。首先,从每个样本中提取CNV,以形成特征矩阵。然后,使用非负矩阵分解(NMF)将此特征矩阵分解为源矩阵和权重矩阵。源矩阵由同一组中所有样本共享的公用CNV组成,权重矩阵指示每个样本中CNV的相应水平。因此,使用CNV的NMF可以区分来自不同种族的样本,即人口聚类。为了验证该方法,我们将其应用于分析模拟数据和1000个基因组计划中的两个真实数据集。仿真数据结果表明,该方法可以高质量地恢复真实的普通CNV。第一次真实数据分析的结果表明,该方法可以将两个具有不同祖先的家庭三人聚类为两个族群,第二次真实数据分析的结果表明,该方法可以应用于大样本全基因组大小由多个组组成。两项结果都证明了所提出的人口聚类方法的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号