首页> 外文会议>Euromicro International Conference on Parallel, Distributed and Network-Based Processing >ParallNormal: An Efficient Variant Calling Pipeline for Unmatched Sequencing Data
【24h】

ParallNormal: An Efficient Variant Calling Pipeline for Unmatched Sequencing Data

机译:ParallNormal:有效的变体调用管道,用于不匹配的排序数据

获取原文

摘要

Nowadays, next generation sequencing is closer to clinical application in the field of oncology. Indeed, it allows the identification of tumor-specific mutations acquired during cancer development, progression and resistance to therapy. In parallel with an evolving sequencing technology, novel computational approaches are needed to cope with the requirement of a rapid processing of sequencing data into a list of clinically-relevant genomic variants. Since sequencing data from both tumors and their matched normal samples are not always available (unmatched data), there is a need of a computational pipeline leading to variants calling in unmatched data. Despite the presence of many accurate and precise variant calling algorithms, an efficient approach is still lacking. Here, we propose a parallel pipeline (ParallNormal) designed to efficiently identify genomic variants from whole- exome sequencing data, in absence of their matched normal. ParallNormal integrates well-known algorithms such as BWA and GATK, a novel tool for duplicate removal (DuplicateRemove), and the FreeBayes variant calling algorithm. A re-engineered implementation of FreeBayes, optimized for execution on modern multi-core architectures is also proposed. ParallNormal was applied on whole-exome sequencing data of pancreatic cancer samples without considering their matched normal. The robustness of ParallNormal was tested using results of the same dataset analyzed using matched normal samples and considering genes involved in pancreatic carcinogenesis. Our pipeline was able to confirm most of the variants identified using matched normal data.
机译:如今,下一代测序已越来越接近肿瘤学领域的临床应用。实际上,它可以鉴定在癌症发展,进展和对治疗的抵抗过程中获得的肿瘤特异性突变。与不断发展的测序技术并行,需要新颖的计算方法来满足将测序数据快速处理成临床相关基因组变体列表的要求。由于并非总是可获得来自肿瘤及其匹配的正常样品的测序数据(不匹配的数据),因此需要一种计算流程来导致变异体调用不匹配的数据。尽管存在许多准确和精确的变体调用算法,但仍然缺乏有效的方法。在这里,我们提出了一个并行管线(ParallNormal),旨在在没有匹配的正常值的情况下从全基因组测序数据中有效识别基因组变体。 ParallNormal集成了著名的算法,例如BWA和GATK,一种用于重复删除的新颖工具(DuplicateRemove)和FreeBayes变体调用算法。还提出了FreeBayes的重新设计实现,针对在现代多核体系结构上的执行进行了优化。将ParallNormal应用于胰腺癌样品的全外显子组测序数据时,无需考虑其匹配的正常值。使用匹配的正常样本分析的同一数据集的结果并考虑了涉及胰腺癌发生的基因,测试了ParallNormal的鲁棒性。我们的产品线能够使用匹配的正常数据来确认识别出的大多数变体。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号