Summary of RNA-Seq. Within the organism, genes are transcribed and (in an eukaryotic organism) spliced to produce mature mRNA transcripts (red). The mRNA is extracted from the organism, fragmented and copied into stable ds-cDNA (blue). The ds-cDNA is sequenced using high-throughput, short-read sequencing methods. These sequences can then be aligned to a reference genome sequence to reconstruct which genome regions were being transcribed. This data can be used to annotate where expressed genes are, their relative expression levels, and any alternative splice variants.[1]
RNA-Seq (named as an abbreviation of RNA sequencing) is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also known as transcriptome.[2][3]
Specifically, RNA-Seq facilitates the ability to look at alternative gene spliced transcripts, post-transcriptional modifications, gene fusion, mutations/SNPs and changes in gene expression over time, or differences in gene expression in different groups or treatments.[4] In addition to mRNA transcripts, RNA-Seq can look at different populations of RNA to include total RNA, small RNA, such as miRNA, tRNA, and ribosomal profiling.[5] RNA-Seq can also be used to determine exon/intron boundaries and verify or amend previously annotated5' and 3' gene boundaries. Recent advances in RNA-Seq include single cell sequencing, bulk RNA sequencing,[6] 3' mRNA-sequencing, in situ sequencing of fixed tissue, and native RNA molecule sequencing with single-molecule real-time sequencing.[7] Other examples of emerging RNA-Seq applications due to the advancement of bioinformatics algorithms are copy number alteration, microbial contamination, transposable elements, cell type (deconvolution) and the presence of neoantigens.[8]
Prior to RNA-Seq, gene expression studies were done with hybridization-based microarrays. Issues with microarrays include cross-hybridization artifacts, poor quantification of lowly and highly expressed genes, and needing to know the sequence a priori.[9] Because of these technical issues, transcriptomics transitioned to sequencing-based methods. These progressed from Sanger sequencing of Expressed sequence tag libraries, to chemical tag-based methods (e.g., serial analysis of gene expression), and finally to the current technology, next-gen sequencing of complementary DNA (cDNA), notably RNA-Seq.
^Thind AS, Monga I, Thakur PK, Kumari P, Dindhoria K, Krzak M, et al. (November 2021). "Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology". Briefings in Bioinformatics. 22 (6). doi:10.1093/bib/bbab259. PMID34329375.