Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCell support #158

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ Suggests:
monocle3,
CoGAPS,
glmpca,
UCell,
Nebulosa,
presto,
flexmix,
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,5 @@ remotes::install_github('satijalab/seurat-wrappers')
| Nebulosa | [Visualization of gene expression with Nebulosa](http://htmlpreview.github.io/?https://github.com/satijalab/seurat-wrappers/blob/master/docs/nebulosa.html) | Jose Alquicira-Hernandez and Joseph E. Powell, _Under Review_ | https://github.com/powellgenomicslab/Nebulosa |
| CIPR | [Using CIPR with human PBMC data](http://htmlpreview.github.io/?https://github.com/satijalab/seurat-wrappers/blob/master/docs/cipr.html) | Ekiz et. al., BMC Bioinformatics 2020 | https://github.com/atakanekiz/CIPR-Package |
| miQC | [Running miQC on Seurat objects](http://htmlpreview.github.io/?https://github.com/satijalab/seurat-wrappers/blob/master/docs/miQC.html) | Hippen et. al., bioRxiv 2021 | https://github.com/greenelab/miQC |
| tricycle | [Running estimate_cycle_position from tricycle on Seurat Objects](http://htmlpreview.github.io/?https://github.com/satijalab/seurat-wrappers/blob/master/docs/tricycle.html) | Zheng et. al., bioRxiv 2021 | https://www.bioconductor.org/packages/release/bioc/html/tricycle.html |
| tricycle | [Running estimate_cycle_position from tricycle on Seurat Objects](http://htmlpreview.github.io/?https://github.com/satijalab/seurat-wrappers/blob/master/docs/tricycle.html) | Zheng et. al., bioRxiv 2021 | https://www.bioconductor.org/packages/release/bioc/html/tricycle.html |
| UCell | [Single-cell gene signature scoring](http://htmlpreview.github.io/?https://github.com/satijalab/seurat-wrappers/blob/master/docs/ucell.html) | Andreatta & Carmona, CSBJ 2021 | https://bioconductor.org/packages/release/bioc/html/UCell.html |
164 changes: 164 additions & 0 deletions docs/ucell.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
---
title: "Single-cell signature scoring with UCell"
date: 'Compiled: `r format(Sys.Date(), "%B %d, %Y")`'
output:
github_document:
html_preview: true
toc: true
toc_depth: 3
html_document:
df_print: kable
---

```{r r setup, include=FALSE}
knitr::opts_chunk$set(
tidy = TRUE,
tidy.opts = list(width.cutoff = 95),
message = FALSE,
warning = FALSE
)
```



This vignette demonstrates how to run UCell on single-cell datasets stored as Seurat objects.
If you use UCell in your research, please cite:

> *UCell: Robust and scalable single-cell gene signature scoring*
>
> Massimo Andreatta and Santiago J. Carmona
>
> Computational and Structural Biotechnology Journal (2021)
>
> DOI: https://doi.org/10.1016/j.csbj.2021.06.043
>
> Website: [GitHub](https://github.com/carmonalab/UCell) and [Bioconductor](https://bioconductor.org/packages/release/bioc/html/UCell.html)

# Overview

In single-cell RNA-seq analysis, gene signature (or “module”) scoring constitutes a simple yet powerful approach to evaluate the strength of biological signals, typically associated to a specific cell type or biological process, in a transcriptome.

UCell is an R package for evaluating gene signatures in single-cell datasets. UCell signature scores, based on the Mann-Whitney U statistic, are robust to dataset size and heterogeneity, and their calculation demands less computing time and memory than other available methods, enabling the processing of large datasets in a few minutes even on machines with limited computing power. UCell can be applied to any single-cell data matrix, and includes functions to directly interact with Seurat objects.


# Installation and setup

UCell is available from [Bioconductor](https://bioconductor.org/packages/release/bioc/html/UCell.html)
```{r results=FALSE, message=FALSE}
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")

BiocManager::install("UCell")
```

Load required packages
```{r}
library(Seurat)
library(SeuratData)
library(UCell)
```

# Get some testing data

For this demo, we will use a small dataset of human PBMCs distributed with [SeuratData](https://github.com/satijalab/seurat-data)
```{r message=F, warning=F, results=F}
InstallData("pbmc3k")
data("pbmc3k")
pbmc3k
```

# Define gene signatures

Here we define some simple gene sets based on the "Human Cell Landscape" signatures [Han et al. (2020) Nature](https://www.nature.com/articles/s41586-020-2157-4). You may edit existing signatures, or add new one as elements in a list.

Note that UCell supports gene sets with both positive and negative genes:
```{r}
signatures <- list(Tcell = c("CD3D","CD3E","CD3G","CD2","TRAC"),
Myeloid = c("CD14","LYZ","CSF1R","FCER1G","SPI1","LCK-"),
NK = c("KLRD1","NCAM1","NKG7","CD3D-","CD3E-"),
Bcell = c("MS4A1","BANK1","PAX5","CD19")
)
```

# Run UCell on Seurat object

```{r message=F, warning=F}
pbmc3k <- AddModuleScore_UCell(pbmc3k, features=signatures, name=NULL)
head(pbmc3k[[]])
```

Generate PCA and UMAP embeddings
```{r message=F, warning=F}
pbmc3k <- pbmc3k |> NormalizeData() |>
FindVariableFeatures(nfeatures = 500) |>
ScaleData() |> RunPCA(npcs = 20) |>
RunUMAP(dims = 1:20)
```

Visualize UCell scores on low-dimensional representation (UMAP)
```{r fig.width=12, fig.height=8, dpi=60}
library(ggplot2)
library(patchwork)

FeaturePlot(pbmc3k, reduction = "umap", features = names(signatures)) &
theme(aspect.ratio = 1)
```

# Signature smoothing

Single-cell data are sparse. It can be useful to 'impute' scores by neighboring cells and partially correct this sparsity. The function `SmoothKNN` performs smoothing of single-cell scores by weighted average of the k-nearest neighbors in a given dimensionality reduction. It can be applied directly on Seurat objects to smooth UCell scores:

```{r}
pbmc3k <- SmoothKNN(pbmc3k,
signature.names = names(signatures),
reduction="pca")
```

```{r fig.width=12, dpi=60}
FeaturePlot(pbmc3k, reduction = "umap", features = c("Bcell","Bcell_kNN")) &
theme(aspect.ratio = 1)
```

Smoothing (or imputation) has been designed for UCell scores, but it can be applied to any other data or metadata. For instance, we can perform knn-smoothing directly on gene expression measurements:

```{r warning=FALSE, fig.width=12, fig.height=8, dpi=60}
genes <- c("CD2","CD19")
pbmc3k <- SmoothKNN(pbmc3k, signature.names=genes,
assay="RNA", reduction="pca", k=20, suffix = "_smooth")

DefaultAssay(pbmc3k) <- "RNA"
a <- FeaturePlot(pbmc3k, reduction = "umap", features = genes) &
theme(aspect.ratio = 1)
DefaultAssay(pbmc3k) <- "RNA_smooth"
b <- FeaturePlot(pbmc3k, reduction = "umap", features = genes) &
theme(aspect.ratio = 1)
a / b
```

# Multi-core processing

If your machine has multi-core capabilities and enough RAM, running UCell in parallel can speed up considerably your analysis. The example below runs on 4 cores in parallel:

```{r}
BPPARAM <- BiocParallel::MulticoreParam(workers=4)
pbmc3k <- AddModuleScore_UCell(pbmc3k, features=signatures, BPPARAM=BPPARAM)
```

# Resources

Please report any issues at the [UCell GitHub repository](https://github.com/carmonalab/UCell).

More demos available on [Bioconductor](https://bioconductor.org/packages/release/bioc/html/UCell.html) and at the [UCell demo repository](https://github.com/carmonalab/UCell_demo).

If you find UCell useful, you may also check out the [scGate package](https://github.com/carmonalab/scGate), which relies on UCell scores to automatically purify populations of interest based on gene signatures.

See also [SignatuR](https://github.com/carmonalab/SignatuR) for easy storing and retrieval of gene signatures.

# Session Info

```{r}
sessionInfo()
```



716 changes: 716 additions & 0 deletions docs/ucell.html

Large diffs are not rendered by default.

Loading