首页> 外文期刊>BMC Genomics >LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing
【24h】

LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing

机译:LONGGF:计算算法和软件工具,通过长读转录组测序快速准确地检测基因融合

获取原文
           

摘要

Abstract Background Long-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate ?150?bp reads. However, there is a general lack of software tools for gene fusion detection from long-read RNA-seq data, which takes into account the high basecalling error rates and the presence of alignment errors. Results In this study, we developed a fast computational tool, LongGF, to efficiently detect candidate gene fusions from long-read RNA-seq data, including cDNA sequencing data and direct mRNA sequencing data. We evaluated LongGF on tens of simulated long-read RNA-seq datasets, and demonstrated its superior performance in gene fusion detection. We also tested LongGF on a Nanopore direct mRNA sequencing dataset and a PacBio sequencing dataset generated on a mixture of 10 cancer cell lines, and found that LongGF achieved better performance to detect known gene fusions over existing computational tools. Furthermore, we tested LongGF on a Nanopore cDNA sequencing dataset on acute myeloid leukemia, and pinpointed the exact location of a translocation (previously known in cytogenetic resolution) in base resolution, which was further validated by Sanger sequencing. Conclusions In summary, LongGF will greatly facilitate the discovery of candidate gene fusion events from long-read RNA-Seq data, especially in cancer samples. LongGF is implemented in C and is available at https://github.com/WGLab/LongGF .
机译:摘要背景,长读RNA-SEQ技术可以生成包含大比例或整个m​​RNA / cDNA分子的读取,因此预计它们会解决通常产生<α150的短读RNA-SEQ技术的继承的临床限制。然而,从长读RNA-SEQ数据中普遍缺乏用于基因融合检测的软件工具,这考虑了高基点误差率和对准误差的存在。结果在本研究中,我们开发了一种快速计算工具,LOGGF,从长读RNA-SEQ数据有效地检测候选基因融合,包括cDNA测序数据和直接mRNA测序数据。我们在数十种模拟的长读RNA-SEQ数据集中进行了评估的LOGGF,并证明了其在基因融合检测中的优异性能。我们还在纳米孔直接mRNA测序数据集上进行了LOGGF和在10个癌细胞系的混合物上产生的PACBIO测序数据集,发现LONGGF在现有的计算工具上检测了检测已知的基因融合的性能更好。此外,我们在急性髓性白血病上测试了LOGGF在急性髓性白血病上的纳米孔cDNA测序数据集,并在基础分辨率中针对易位(以前已知的细胞遗传学分辨率中已知)的确切位置,其通过Sanger测序进一步验证。总结结论,LongGF将极大地促进从长读RNA-SEQ数据的候选基因融合事件发现,特别是在癌症样本中。 LOGGF在C中实施,可在HTTPS://github.com/wglab/longgf中获得。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号