首页> 美国卫生研究院文献>Scientific Reports >Evaluating information content of SNPs for sample-tagging in re-sequencing projects
【2h】

Evaluating information content of SNPs for sample-tagging in re-sequencing projects

机译:评估SNP的信息内容以在重新测序项目中进行样品标记

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Sample-tagging is designed for identification of accidental sample mix-up, which is a major issue in re-sequencing studies. In this work, we develop a model to measure the information content of SNPs, so that we can optimize a panel of SNPs that approach the maximal information for discrimination. The analysis shows that as low as 60 optimized SNPs can differentiate the individuals in a population as large as the present world, and only 30 optimized SNPs are in practice sufficient in labeling up to 100 thousand individuals. In the simulated populations of 100 thousand individuals, the average Hamming distances, generated by the optimized set of 30 SNPs are larger than 18, and the duality frequency, is lower than 1 in 10 thousand. This strategy of sample discrimination is proved robust in large sample size and different datasets. The optimized sets of SNPs are designed for Whole Exome Sequencing, and a program is provided for SNP selection, allowing for customized SNP numbers and interested genes. The sample-tagging plan based on this framework will improve re-sequencing projects in terms of reliability and cost-effectiveness.
机译:样品标签设计用于识别偶然的样品混合,这是重新测序研究中的主要问题。在这项工作中,我们开发了一个模型来测量SNP的信息含量,以便我们可以优化一组能够区分最大信息的SNP。分析显示,低至60个优化的SNP可以区分当今世界人口中的个体,而实际上只有30个优化的SNP足以标记多达10万个个体。在10万个人的模拟人口中,由30个SNP的优化集生成的平均汉明距离大于18,对偶频率小于1万分之一。事实证明,这种样本区分策略在大样本量和不同数据集中均很可靠。优化的SNP集被设计用于全外显子组测序,并且提供了一个SNP选择程序,允许定制SNP编号和感兴趣的基因。基于此框架的样本标记计划将在可靠性和成本效益方面改善重测序项目。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号