Accurate alternative splicing detection and transcript reconstruction are essential to characterize gene regulation and function and to understand development and disease. However, current methods for extracting splicing variation from RNA-seq data only analyze signals from a single sample, which limits transcript reconstruction and fails to detect a complete set of alternative splicing events. We developed a novel feature selection method, JULiP, that analyzes information across multiple samples to identify alternative splicing variation in the form of splice junctions (introns). It formulates the selection problem as a regularized program, utilizing the latent information from multiple RNA-seq samples to construct an accurate and comprehensive intron set. JULiP is highly accurate, and could detect thousands more introns in any one sample, >30% more than the most sensitive single-sample method, and over 11% more introns in the cumulative set of samples, at higher or comparable precision (>98%). Tested assemblers included Cufflinks, CLASS2, FlipFlop and StringTie, and the multi-sample assembler ISP. JULiP is multi-threaded and parallelized, taking roughly one minute to analyze up to 100 data sets on a multi-computer cluster, and can easily scale up to allow analyses of hundreds and thousands of RNA-seq samples.
展开▼