首页> 外文会议>IEEE International Conference on Computational Advances in Bio and Medical Sciences >JULiP: An efficient model for accurate intron selection from multiple RNA-seq samples
【24h】

JULiP: An efficient model for accurate intron selection from multiple RNA-seq samples

机译:JULiP:从多个RNA序列样品中准确选择内含子的有效模型

获取原文

摘要

Accurate alternative splicing detection and transcript reconstruction are essential to characterize gene regulation and function and to understand development and disease. However, current methods for extracting splicing variation from RNA-seq data only analyze signals from a single sample, which limits transcript reconstruction and fails to detect a complete set of alternative splicing events. We developed a novel feature selection method, JULiP, that analyzes information across multiple samples to identify alternative splicing variation in the form of splice junctions (introns). It formulates the selection problem as a regularized program, utilizing the latent information from multiple RNA-seq samples to construct an accurate and comprehensive intron set. JULiP is highly accurate, and could detect thousands more introns in any one sample, >30% more than the most sensitive single-sample method, and over 11% more introns in the cumulative set of samples, at higher or comparable precision (>98%). Tested assemblers included Cufflinks, CLASS2, FlipFlop and StringTie, and the multi-sample assembler ISP. JULiP is multi-threaded and parallelized, taking roughly one minute to analyze up to 100 data sets on a multi-computer cluster, and can easily scale up to allow analyses of hundreds and thousands of RNA-seq samples.
机译:准确的替代剪接检测和转录本重建对于表征基因调控和功能以及了解发育和疾病至关重要。但是,当前从RNA-seq数据中提取剪接变异的方法仅分析来自单个样品的信号,这限制了转录本的重建,并且无法检测出完整的备选剪接事件集。我们开发了一种新颖的特征选择方法JULiP,该方法可以分析多个样本中的信息,以识别剪接点(内含子)形式的可变剪接变异。它利用来自多个RNA-seq样本的潜在信息来构建一个准确而全面的内含子集,从而将选择问题表达为一个正则化程序。 JULiP具有很高的准确性,并且可以以更高或相当的精度(> 98)检测到任何一个样本中的数千个内含子,比最灵敏的单样本方法检出的内含子高30%以上,并且在累计样本集中的检出的内含子多11%以上。 %)。经过测试的汇编器包括Cufflinks,CLASS2,FlipFlop和StringTie,以及多样本汇编器ISP。 JULiP是多线程和并行化的,大约花费一分钟的时间来分析多计算机集群上的多达100个数据集,并且可以轻松扩展以允许分析成千上万的RNA序列样品。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号