New methods enable new discoveries. My time as a PhD student has run in parallel with thematuration of the RNA-seq method, and I have used it to discover basic properties of geneexpression and transcriptomes. My part has been bioinformatics – the computer analysis ofbiological data.RNA-seq quantifies gene expression for all genes in one experiment, allowing discoverieswithout prior knowledge, as opposed to single-gene hypothesis testing. When I started my PhD,this was done by microarray followed by qRT-PCR validation, which can be arduous. In contrastto microarrays, RNA-seq quantifies expression with little ambiguity of which gene eachexpression value corresponds to, and in absolute terms. But at the time, data analysis of RNA-seqwas full of unknowns and there were little software available. Nowadays, partly the result of mywork, the data analysis is much less complicated, and RNA-seq can be performed on diminutivesamples, down to single cells, which was not viable using microarrays.My first study (Paper I) used one of the very first RNA-seq datasets to study general features oftranscriptomes, such as mean mRNA length (~1,500 nt) and the number of genes expressed pertissue (~13,000). I also found special features of some tissues: the liver transcriptome isdominated by a few highly expressed gene, brain expresses especially long mRNAs and testisexpresses many more genes than other tissues.Following this tissue RNA-seq study, I evaluated a new library preparation method for single-cellRNA-seq (Paper III), developed before the prevalence of single-cell RNA-seq. I used technicalreplicates to show that the method was accurate and reliable for the more highly expressed genesat single-cell RNA levels, and with input RNA amounts corresponding to >50 cells it produced asgood quality data as bulk RNA-seq. Then the method was applied on melanoma cells isolatedfrom human blood, and I listed surface antigen genes that distinguished these circulating tumourcells from other cells in the blood.This single-cell RNA-seq method was then applied on pre-implantation embryo cells (Paper IV).Using first-generation crosses between two mouse strains, I could separate the expression fromthe maternal and the paternal copies of the genes. I found that 12-24% of the genes express onlyone of their two copies in any given cell, in a random manner that affects almost all the expressedgenes. I also found that the two copies are expressed independently from each other.Finally, I studied Sox transcription factors during neural development (Paper II), combiningRNA-seq and microarray data for different cell types with ChIP-seq data for transcription factorbinding and histone modifications. I found that Sox proteins bind to the enhancers active in thestem cells where the Sox proteins are active, but also to enhancers specific to subsequent cells iniidevelopment. I also found that different Sox factors bind to much the same enhancers, and thatthey can induce histone modifications.In conclusion, my work has advanced the RNA-seq method and increased the understanding oftranscriptional regulation and output.
展开▼