Chapter 1

Basics

Discover basic usages of the scSpotlight.

Subsections of Basics

Installation

Installation via R

One can install scSpotlight from GitHub with devtools or from r-universe. We recommend to use pak instead of traditional way, as it’s much faster.

# Install scSpotlight in R:
install.packages('scSpotlight', repos = c('https://obenno.r-universe.dev', 'https://cloud.r-project.org'))
# install.packages('pak')
pak::repo_add(scSpotlight = "https://obenno.r-universe.dev")
pak::pkg_install("scSpotlight")
# install.packages("devtools")
devtools::install_github("obenno/scSpotlight@v0.0.3")
# install.packages("pak")
pak::pkg_install("obenno/scSpotlight@v0.0.3")

Docker

To pull the latest image from the command line

docker pull registry-intl.cn-hangzhou.aliyuncs.com/thunderbio/scspotlight:0.0.3

To run the app on specific port (e.g. port:8081), please use the command below:

docker run -p 8081:8081 registry-intl.cn-hangzhou.aliyuncs.com/thunderbio/scspotlight:0.0.3 Rscript -e 'scSpotlight::run_app(options = list(port=8081, host="0.0.0.0", launch.browser = FALSE), runningMode="processing")'

Invocation

To start using scSpotlight, please use the command below and access the app via: 127.0.0.1:8081

scSpotlight::run_app(options = list(port = 8081, host = "0.0.0.0"))

The default mode of app is processing. User could also change mode to viewer, which will only allow illustrating dataset and querying gene expressions:

scSpotlight::run_app(options = list(port = 8081, host = "0.0.0.0"), runningMode = "viewer")

If one needs to load a very large dataset, use dataDir parameter to mount data directory and load Rds file directly.

scSpotlight::run_app(options = list(port = 8081, host = "0.0.0.0"), runningMode = "viewer", dataDir = "/path/to/data_directory")

Filter Cells

After uploading compressed matrix tarball, scSpotlight will automatically process the data with a standard Seurat workflow and illustrate the UMAP and cell clusters in the main panel. The number of genes detected (nFeature_RNA) in each cell, number of UMIs (nCount_RNA), percentage of the mitochondrial UMIs (percent.mt), percentage of the ribosomal protein UMIs (percent.rp) will be summarized via a violin plot in the lower information panel.

After expanding the informaiton panel, user could drag the lower right corner of the main panel to adjust the panel size.

QC metrics and filtration criteria

Here we use the same QC metrics and filtering criteria as Seurat tutorial:

  1. Discard cells/droplets with too few genes detected or too many genes.
  • Low-quality cells or empty droplets will often have very few genes
  • Cell doublets or multiplets may exhibit an aberrantly high gene count
  • Similarly, the total number of molecules detected within a cell (correlates strongly with unique genes), thus only filter nGene is sufficient
  1. Discard cells/droplets with high percentage UMIs belonging to mitochondrial mRNA
  • Low-quality / dying cells often exhibit extensive mitochondrial contamination
  • We calculate mitochondrial QC metrics with the PercentageFeatureSet() function, which calculates the percentage of counts originating from a set of features
  • We use the set of all genes starting with MT- as a set of mitochondrial genes

Discard low quality cells

User could adjust cell filtering criteria, different dataset (with varies sequencing depth) might need to use different maximum number of genes.

Adjust Clustering Parameters

In most cases, the default parameters for clustering will not always satisfy user’s requirements. scSpotlight allows users to easily choose the method of highly variable genes selection, tune dimension reduction and clustering arguments on the left side bar, thus better identify the rare cell types.

In dimension reduction step, user will have to choose an optimized number of PCA components as the input of non-linear dimension reduction (UMAP/TSNE). This value could be selected according to elbow plot in Seurat analysis workflow. User could expand the information panel to view elbow plot, Y axis is the percentage of the variance that each component could explain, X axis is the ranked components. Typically, user could select the “elbow” point. For instance, in Seurat pbmc3k tutorial, they used nDim=10.

After finding cell clusters with different resolution, the updated grouping information could be chosen from the right-side panel.

Gene Expression

The most frequent request during scRNAseq analysis might be checking genes’ expression, manually annotating cell type is to align marker gene expression to unsupervised cell clusters. This process requires user to repeatedly tune the clustering parameters and verify markers’ expression in the results. scSpotlight tries to simplify gene query process and make the whole step easier and more intuitive.

Multiple Genes

User could input multiple gene names (symbols) in the Feature Expression module and the main plot panel will switch to Seurat::FeaturePlot to represent gene expression on UMAP plot. User could also check the expression in Seurat::DotPlot, which could better dipict the percentage of cells in the cluster expressing the gene of interest.

Individual Gene

For individual genes, user could double click the subfigure of the FeaturePlot to “zoom in”. The UMP/TSNE of the cell clusters and the gene expression will be shown side by side. Hovering category labels will highlight the cells of the cluster. Meanwhile, one also could view the gene expression in the Seurat::VlnPlot after expanding the info panel.

Select Cells and Assign Identity

scSpotlight allows user to select cells of interest either by category (i.e. group.by) or manually via the lasso tool. After select cells, user will be able to assign new identity to them.

Select by category

After choosing “group.by” information from category module, all the identites of the grouping could be selected from a drop-down menu of the Rename Cluster module. For instance, here were illustrated pbmc3k dataset seurat_clusters groups, and overall expressions of the GNLY gene, which is a “Natrual Kill” (NK) cell marker. We found cluster 4 cells specifically express GNLY, thus could be assigned to NK cells. User could select cells of cluster 4 and input a new grouping name (e.g. celltype) in New Category Name, and assign cell labels “NK” in Assign As.

Unsupervised clustering result is not always perfect, and user may need to adjust cell annotation manually. One could activate lasso tools by pressing shift, then click and drag the mouse to select cells. The video below shows how to select cell of interest with lasso tool and assign new celltype labels.

History

Changelog

v0.0.2 (2024-01-20)

First Release.