...
首页> 外文期刊>BMC Genomics >ChimPipe: accurate detection of fusion genes and transcription-induced chimeras from RNA-seq data
【24h】

ChimPipe: accurate detection of fusion genes and transcription-induced chimeras from RNA-seq data

机译:ChimPipe:从RNA序列数据中准确检测融合基因和转录诱导的嵌合体

获取原文
           

摘要

Background Chimeric transcripts are commonly defined as transcripts linking two or more different genes in the genome, and can be explained by various biological mechanisms such as genomic rearrangement, read-through or trans-splicing, but also by technical or biological artefacts. Several studies have shown their importance in cancer, cell pluripotency and motility. Many programs have recently been developed to identify chimeras from Illumina RNA-seq data (mostly fusion genes in cancer). However outputs of different programs on the same dataset can be widely inconsistent, and tend to include many false positives. Other issues relate to simulated datasets restricted to fusion genes, real datasets with limited numbers of validated cases, result inconsistencies between simulated and real datasets, and gene rather than junction level assessment. Results Here we present ChimPipe, a modular and easy-to-use method to reliably identify fusion genes and transcription-induced chimeras from paired-end Illumina RNA-seq data. We have also produced realistic simulated datasets for three different read lengths, and enhanced two gold-standard cancer datasets by associating exact junction points to validated gene fusions. Benchmarking ChimPipe together with four other state-of-the-art tools on this data showed ChimPipe to be the top program at identifying exact junction coordinates for both kinds of datasets, and the one showing the best trade-off between sensitivity and precision. Applied to 106 ENCODE human RNA-seq datasets, ChimPipe identified 137 high confidence chimeras connecting the protein coding sequence of their parent genes. In subsequent experiments, three out of four predicted chimeras, two of which recurrently expressed in a large majority of the samples, could be validated. Cloning and sequencing of the three cases revealed several new chimeric transcript structures, 3 of which with the potential to encode a chimeric protein for which we hypothesized a new role. Applying ChimPipe to human and mouse ENCODE RNA-seq data led to the identification of 131 recurrent chimeras common to both species, and therefore potentially conserved. Conclusions ChimPipe combines discordant paired-end reads and split-reads to detect any kind of chimeras, including those originating from polymerase read-through, and shows an excellent trade-off between sensitivity and precision. The chimeras found by ChimPipe can be validated in-vitro with high accuracy.
机译:背景技术嵌合转录物通常被定义为连接基因组中两个或多个不同基因的转录物,并且可以通过各种生物学机制(例如基因组重排,通读或反式剪接)进行解释,也可以通过技术或生物人工产物进行解释。几项研究表明它们在癌症,细胞多能性和运动性中的重要性。最近开发了许多程序来从Illumina RNA-seq数据(主要是癌症中的融合基因)中鉴定嵌合体。但是,同一数据集上不同程序的输出可能不一致,并且往往包含许多误报。其他问题涉及限于融合基因的模拟数据集,经过验证的病例数有限的真实数据集,模拟数据集与真实数据集之间的结果不一致以及基因而不是连接水平评估。结果在这里,我们介绍了ChimPipe,这是一种模块化且易于使用的方法,可从配对末端Illumina RNA-seq数据可靠地鉴定融合基因和转录诱导的嵌合体。我们还针对三种不同的阅读长度生成了逼真的模拟数据集,并通过将精确的结合点与经过验证的基因融合相关联,增强了两个金标准癌症数据集。在此数据上对ChimPipe进行基准测试以及其他四个最新技术工具表明,ChimPipe是识别两种数据集的精确交点坐标的最佳程序,并且在灵敏度和精度之间取得了最佳平衡。将ChimPipe应用于106个ENCODE人RNA-seq数据集,鉴定出137个高信度嵌合体,它们连接了其亲本基因的蛋白质编码序列。在随后的实验中,可以验证四个预测嵌合体中的三个,其中两个在大多数样本中经常表达。这三个案例的克隆和测序揭示了几个新的嵌合转录本结构,其中三个具有编码我们假设其具有新作用的嵌合蛋白的潜力。将ChimPipe应用于人类和小鼠的ENCODE RNA-seq数据可鉴定出两种物种共有的131个复发性嵌合体,因此具有潜在的保守性。结论ChimPipe结合了不一致的成对末端读数和分开读数,可检测任何种类的嵌合体,包括源自聚合酶直读的嵌合体,并显示出灵敏度和精密度之间的极佳折衷。 ChimPipe发现的嵌合体可以在体外进行高精度验证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号