a As bulk RNA-Seq data does not suffer from technical dropouts and is much more reliable than scRNA-Seq data, for a given choice of tissue, we use the high-powered GTEX bulk RNA-Seq expression set (>20,000 genes, 8555 samples, 30 tissue types) to derive a corresponding tissue-specific regulatory network, consisting of a gold-standard list of tissue-specific transcription factors (TFs) and their targets (regulons). The inference of the network uses a greedy partial correlation framework, while also adjusting for stromal (immune cell) contamination within the tissue. b Power/Sensitivity (SE) estimates to detect tissue-specific TFs in the GTEX bulk RNA-Seq dataset as a function of the minor cell-type fraction (MCF) (left), number of samples in the tissue of interest (middle), and average fold change of differential expression between the tissue of interest and the rest of tissues in GTEX (right). In the left panel, we depict SE curves for four tissue types in GTEX (number of samples in each tissue is given) and for an average FC = 8. In the middle panel, we depict SE curves for two MCF values, as indicated. In the right panel, we assume a sample size of 150. An MCF value of 0.05 means we assume that the tissue-specific TFs is only highly expressed in 5% of the tissue resident cells. c Given the high technical dropout rate and overall noisy nature of scRNA-Seq data, it may not be possible to reliably infer regulatory activity from the TF expression profile alone. However, using the TF regulons derived in a, and using the genes within the regulon that are not strongly affected by dropouts, we can estimate regulatory activity across single cells. Depicted is an example with three lung-specific TFs (Sox18, Tbx4, Foxa2), as well as the expression pattern of the regulon genes for Tbx4, in the context of a lung development study from embyronic day 10 to adult stage (Treutlein dataset). We use linear regressions between the expression values of all the genes in a given cell and the corresponding TF-regulon profile, to derive the activity of the TF as the t-statistic of the estimated regression coefficient, resulting in a regulatory activity map over the tissue-specific TFs and single cells. The same tissue-specific TFs and their regulons can be applied to normal-cancer scRNA-Seq datasets to infer regulatory activity maps across normal and cancer cells.
展开▼