...
首页> 外文期刊>Briefings in bioinformatics >Copy number aberrations from Affymetrix SNP 6.0 genotyping data-how accurate are commonly used prediction approaches?
【24h】

Copy number aberrations from Affymetrix SNP 6.0 genotyping data-how accurate are commonly used prediction approaches?

机译:复制来自Affymetrix SNP 6.0基因分类数据的数字像差 - 常用预测方法的准确程度如何?

获取原文
获取原文并翻译 | 示例

摘要

Copy number aberrations (CNAs) are known to strongly affect oncogenes and tumour suppressor genes. Given the critical role CNAs play in cancer research, it is essential to accurately identify CNAs from tumour genomes. One particular challenge in finding CNAs is the effect of confounding variables. To address this issue, we assessed how commonly used CNA identification algorithms perform on SNP 6.0 genotyping data in the presence of confounding variables.We simulated realistic synthetic data with varying levels of three confounding variables-the tumour purity, the length of a copy number region and the CNA burden (the percentage of CNAs present in a profiled genome)-and evaluated the performance of OncoSNP, ASCAT, GenoCNA, GISTIC and CGHcall. Furthermore, we implemented and assessed CGHcall*, an adjusted version of CGHcall accounting for high CNA burden. Our analysis on synthetic data indicates that tumour purity and the CNA burden strongly influence the performance of all the algorithms. No algorithm can correctly find lost and gained genomic regions across all tumour purities. The length of CNA regions influenced the performance of ASCAT, CGHcall and GISTIC. OncoSNP, GenoCNA and CGHcall* showed little sensitivity. Overall, CGHcall* and OncoSNP showed reasonable performance, particularly in samples with high tumour purity. Our analysis on the HapMap data revealed a good overlap between CGHcall, CGHcall* and GenoCNA results and experimentally validated data. Our exploratory analysis on the TCGA HNSCC data revealed plausible results of CGHcall, CGHcall* and GISTIC in consensus HNSCC CNA regions. Code is available at https://github.com/adspit/PASCAL.
机译:已知复制数像差(CNA)强烈影响癌肠和肿瘤抑制基因。鉴于CNA在癌症研究中发挥着关键作用,必须准确地识别来自肿瘤基因组的CNA。寻找CNA的一个特殊挑战是混淆变量的影响。为了解决这个问题,我们评估了在混淆变量存在下常用的CNA识别算法如何在SNP 6.0基因分型数据上进行。我们模拟的现实合成数据具有不同水平的三个混杂变量 - 肿瘤纯度,拷贝数区域的长度和CNA负担(在成熟的基因组中存在的CNA的百分比) - 评估了OncosnP,Actat,GenoCNA,Gistic和CGHCall的性能。此外,我们实施并评估了CGHCALL *,调整后的CGHCALL核算版本,用于高CNA负担。我们对合成数据的分析表明肿瘤纯度和CNA负荷强烈影响所有算法的性能。没有算法可以在所有肿瘤纯度中正确找到丢失和获得的基因组区域。 CNA区域的长度影响了ascat,cghcall和gistic的性能。 oncosnp,genocna和cghcall *表现出很少的敏感性。总体而言,CGHCALL *和onCOSNP表现出合理的性能,特别是在具有高肿瘤纯度的样品中。我们对HAPMAP数据的分析显示了CGHCALL,CGHCALL *和GenoCNA结果和实验验证的数据之间的良好重叠。我们对TCGA HNSCC数据的探索性分析显示了CGHCALL,CGHCALL *和GNIT法的合理结果,共识HNSCC CNA区域。代码可在https://github.com/adspit/pascal中获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号