Our use of cookies
We would like to use necessary cookies to perform the basic functions of this website. Such strictly necessary cookies do not require your consent and are therefore always active. We would also like to set analytical cookies and advertising cookies that help us make improvements by measuring how you use our website. Both the analytical cookies and advertising cookies require your consent and are therefore called “optional cookies”. Detailed information about the use of cookies and how you can control your consent for optional cookies can be found in our Cookie Policy and Privacy Policy.

Hello! We use cookies to help improve your experience of our website. With your consent, statistical and analytical cookies may also be used on this website. If you agree to the use of these additional cookies, please mark your choice:
Back to Science

Smart-seq3Xpress: a scalable, high-sensitivity and low-cost Single cell Full-length RNA-seq methodology.

03-05-24

fa4PholA70uWllhc8IyHOARF1xZWd28O03Xcx6AK.png
Single-cell sequencing, which provides a high-resolution of cellular differences by sequencing nucleic acid information namely DNA and RNA from individual cells, has become one of the key applications in the next-generation sequencing (NGS) to better understand the individual heterogeneity in physiological processes and the personal differences in responding of medical treatments, shedding light on discovering novel biological mechanisms and personalized therapies.
 
Currently, most single-cell RNA-sequencing (scRNA-seq) methods map a short part of the RNA molecules, from either the 5′ or 3′ end, together with a unique molecular identifier (UMI)(Mereu et al., Nat. Biotechnol, 2020). These RNA end-counting strategies have been efficient in estimating gene expression across large numbers of cells, while controlling for PCR amplification biases attributed to low quantity input of RNAs and compensating on the throughput capacity of sequencing, but the limited coverage of those methods heavily obscure the detection of those transcribed genetic variations and transcript isoform expression caused by, for instance, RNA alternative-splicing, which is a common mechanism of RNA processing and abnormalities in such a process have been shown to associate with the development of various diseases, such as cancers and neurodegenerative diseases (Zhang et al., Nature, 2021; Garcia-Blanco et al., Nat. Biotechnol, 2004; Mills et al., Neurobiol Aging, 2012).   Moreover, though those end-counting scRNA-seq enable higher throughput sequencing capacity in terms of cell number, their relatively low sensitivity and high cost are the major bottlenecks to be challenged. In the meanwhile, long-read sequencing technologies enable direct quantification of allele- and isoform-level expression, however their current read depths, biased-error rate and sample preparation procedures obstruct their broad applications across cells, tissues and organisms (Byrne et al., Nat. Commum, 2017; Gupta et al., Nat. Biotechnol, 2018)
 
To overcome those technical demerits, Hagemann-Jensen and Ziegenhain et al. at Karolinska Institute (Nat. Biotechnol, 2022; Nat. Biotechnol, 2020. https://www.nature.com/articles/s41587-022-01311-4) have developed and further optimized a scalable and sensitive short-read based sequencing method, Smart-seq3xpress to enable full-length RNA coverage and more importantly to provide an unprecedent high-resolution of individual RNAs to isoforms and their alleic origin in single cells with lower operating cost and shortened library preparation duration to a single workday. The well-established Smart-seq3xpress protocol implements UMI in the 5’-end of full-length RNA transcripts to establish a unique identity of each RNA molecule, thus inclusion of UMI empowers the detection of transcript copy number variation (CNV) and to overcome the biases caused by PCR amplification processes, further bringing up sequencing sensitivity and specificity. Moreover, Smart-seq3xpress is a plate-based method, which could be easily scaled up from 96- to 384-well plates possibly facilitated by automation processes. More practically, sorting of the cells of interest into the well-plates can be separated in spatial and temporal manner as the filled well-plates can be frozen anytime until further processing, showing a major advantage when compared to the method that viable cell suspension needs to be processed immediately. Moreover, this study also put substantial effort in miniaturizing the key factors in scRNA-seq, including lowering the reaction volume, decreasing the input of cDNA, reducing PCR cycles, and adjusting Tn5 and cDNA ratio for more cost-effective tagmentation without compromising the detection performance. Additionally, adjusting of the salt components and concentration required for the reactions, implementation of new enzymes and uniquely setting an extensive RNA count QC by utilizing of self-developed new UMIcountR R package to further improve the overall detection power of Smart-seq3xpress. Furthermore, this method provides a more environmental-friendly and cost-effective solution because the material and resources are ten-fold less when compared with other RNA-seq methods, and the usage plastics consumables is significantly reduced by new 3D-printed tools and by contact-less reagent dispensing as well as pre-dispensed desiccated index primers.

fig. 1 | Scalable full-transcript coverage scRNA-seq with Smart-seq3xpress. a, Schematic of nanoliter cDNA synthesis reactions performed in wells of 384-well PCR plates with 3 µl of hydrophobic overlay. b, Illustration of reduced-volume experiments with the lysis, RT and PCR volumes used. c, The number of genes detected per HEK293TF cell at each reaction volume, when sampling 100,000 sequencing reads (n= 100, 19, 32 and 28 cells, respectively). P value represents a two-sided t-test between the 10-µl and 1-µl conditions. d, Influence of hydrophobic overlays on miniaturized cDNA synthesis (1 µl total volume). For each compound, boxes depict the number of genes detected per HEK293FT cell (n= 17, 34, 39, 28, 25, 24, 28, 38 and 70, respectively), subsampled at 200,000 sequencing reads per cell. e, Replacement of the bead-based cDNA cleanup by dilution in single HEK293FT (n= 58 and 52, respectively) cells. Box plots show the number of genes detected per cell and condition (at 100,000 reads) with P value for a two-sided t-test across conditions. f, Tagmentation complexity using 0.1 µl of ATM Tn5 enzyme per HEK293FT cell in relation to input cDNA. The median number of detected genes as a function of raw sequencing reads (n= 51, 53, 54, 53, 53 and 52 cells for 25, 50, 75, 100, 200 and 500 pg, respectively). g, Tagmentation complexity for varying amounts of cDNA input. Complexity was summarized as unique aligned and gene-assigned UMI-containing read pairs per 400,000 raw reads and HEK293FT cell (n= 49, 51, 51, 50, 51 and 44). h, Schematic outline of the Smart-seq3 and Smartseq3xpress workflows. i, The number of genes detected with Smart-seq3xpress after variable amounts of pre-amplification PCR cycles. Median number of genes is reported as a function of raw sequencing reads in HEK293FT cells (n= 93, 98, 108, 113, 102, 114 and 118 cells for 10, 12, 13, 14, 15, 16 or 20 cycles, respectively). j, Fraction of UMI-containing reads to internal reads for HEK293FT cells prepared with Smartseq3xpress (KAPA HiFi; 12 PCR cycles), at a variable range of TDE1 Tn5 amounts (n= 64 cells each). k, Fraction of UMI-containing reads to internal reads for HEK293FT cells prepared with Smartseq3xpress (SeqAmp; 12 PCR cycles), at a variable range of TDE1 Tn5 amounts (n= 60 cells each). l,m, Optimization of RT and PCR conditions across 376 experimental conditions on HEK293FT cells. Colors indicate particular experimental conditions: Smart-seq3xpress with Smart-seq3 TSO (purple; n= 912), 52 °C RT/alternate TSO implementation (yellow; n= 74), fixed spacer TSO variant (blue; n= 45), FLASH-seq TSO variant (green; n= 55), Smart-seq3xpress with improved TSO (pink; n= 63) and all other conditions (gray; n= 21,707). Scatter plots denote the level of artifactual TSO-UMI reads and RNA counting errors (l) as well as a percentage of ribosomal RNA (rRNA) mapped reads and number of detected genes in 100,000 reads after removal of strand invasion reads (m). n, Benchmarking of Smart-seq3 variants. Box plots show the number of genes detected per HEK293FT cell in full-volume Smart-seq3 (ref. 2), low-volume Smart-seq3 and Smart-seq3xpress implementations, at the indicated read depths (n= 109–110, 18–27, 9–170, 20–55 and 9–63 cells, depending on the cells available at the given sequencing depths). The box plots (in c, d, e, j, k and n) show the median and first and third quartiles as a box, and the whiskers indicate the most extreme data points within 1.5 lengths of the box. cSt, centistoke.

To demonstrate Smart-seq3xpress, Hagemann-Jensen and Ziegenhain et al. examined 26,260 human peripheral blood mononuclear cells (hPBMCs) with an average depth of 258,000 read pairs per cell. The results provided a full-length transcript profiling and showed the reconstruction of T-cell receptor sequences, identifying several cell types and states, including unconventional T-cell populations such as MAIT, and gamma-delta T-cells, with higher gene detection across cell types. Notably, these results are achieved by using almost 10 times less hPBMCs when compared to the droplet-based sc-RNAseq method. In the following comparison study with a droplet-based method, Smart-seq3xpress gave more superior performances in multiple aspects in terms of single-nucleotide polymorphism (SNP) detection (9-fold higher), more read support over exon–exon/exon–intron splice junctions by its full-length RNA coverage and T-cell clustering consistency. The study of hPBMC Smart-seq3xpress data on alternative splicing further revealed significant variation in inclusion levels and distinct splicing patterns among cell types, of which are independent from their gene expression level. A later benchmarking study (Probst et al., BMC Genomics, 2022) on full-length scRNAseq indicated that Smart-seq3 protocol presented the highest gene detection per single cell at the lower price when compared with other kits.

Fig. 2 | Application of Smart-seq3xpress to hPBMCs. a, Dimensional reduction (UMAP) of 26,260 hPBMC transcriptomes produced with Smartseq3xpress (KAPA, four donors; SeqAmp with improved TSO, three donors) colored and annotated by cell type. EM, effector memory; CM, central memory; NK, natural killer; ILC, innate lymphoid cell; HSPC, hematopoietic stem and progenitor cell; MAIT, mucosal-associated invariant T cell. b, Smartseq3xpress-based TCR reconstruction (TRaCeR) overlayed onto UMAP. c, QC of TCR reconstructions obtained with Scirpy, enumerating the number of T cells with certain types of TCR reconstructions. d, Benchmarking of Smart-seq2, Smart-seq3 and Smart-seq3xpress (SeqAmp, improved TSO) in primary hPBMCs. Each cell was downsampled to 100,000 reads, and the number of detected genes from exon-mapping reads is shown for representative cell types: B cells (n= 73, 366 and 859, respectively), CD4+ T cells (n= 261, 1,270 and 1,847, respectively), CD8+ T cells (n= 76, 272 and 913, respectively) and NK cells (n= 73, 352 and 601, respectively). P values indicate results of two-sided t-tests between the Smart-seq3 and Smart-seq3xpress. e, Differential gene expression analysis (Wilcoxon test, Padj< 0.01) between naive CD4− T cell cluster (n= 2,476) and clonal CD4−T cell cluster (n= 682). Indicated are the top five TCR genes driving the clonal CD4− T cell cluster separation. f, Dot plot showing expression of selected marker genes for MAIT, gamma-delta and clonal CD4+/CD8+ T cells in all annotated clusters, with size of the dot denoting the detection of a gene within the cells of the cluster and color scale denoting the average expression level. g, Analysis of captured transcribed genetic variation in donor-matched Smart-seq3xpress and 10x Genomics 3′ version 3.1 data. For each cell passing QC (n= 2,938 and 9,846, respectively), the number of SNPs with alternate allele coverage per cell are indicated (left) as well as the average SNP coverage normalized by the sequencing depth (right). h, Frequency of RNA-velocity-informative fully spliced reads in donor-matched Smart-seq3xpress and 10x Genomics 3′ version 3.1 data. For each cell in representative cell types—B cells (n= 404 and 642, respectively), CD4+ T cells (n = 1,320 and 1,317, respectively), CD8+ T cells (n= 417 and 1,181, respectively) and NK cells (n= 441 and 498, respectively)—we summarized the percentage of reads spanning exon–exon junctions, with nominal P values for a two-sided t-test.i, Differential splicing isoform analysis using BRIE2. Shown is a volcano plot of tested skipped-exon events color-coded by significant variation when testing for any cell type (ELBO gain >20). ELBO gain is a surrogate for the Bayes factor (that is, the likelihood ratio of two hypotheses). The x axis denotes the effect size on the distribution of PSI values in a cell type. j, Overlays of color-coded PSI values inferred for each sequenced cell (n= 26,260) in genes with significant cell type splicing variation (PTPRC, GUSBP11 and ISG20). The box plots (in d, g and h) show the median and first and third quartiles as a box, and the whiskers show the most extreme data points within 1.5 lengths of the box. HSPC, hematopoietic stem and progenitor cell.

The optimized protocol and data provided in this study demonstrate Smart-seq3xpress can produce unprecedented high-quality, full-transcript-coverage scRNA-seq data that for the first time realize scalable, sensitive and cost-effective analyses of gene expression and alternative splicing across complex human samples at single-cell, isoform- and allele-specific resolution.