Skip to content

typekey/scRNA-seq_notes

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scRNA-seq data analysis tools and papers

MIT License PR's Welcome

Single-cell RNA-seq related tools and genomics data analysis resources. Tools are sorted by publication date, reviews and most recent publications on top. Unpublished tools are listed at the end of each section. Please, contribute and get in touch! See MDmisc notes for other programming and genomics-related notes. See scATAC-seq_notes for scATAC-seq related resources.

Table of content

Awesome

  • single-cell-pseudotime - an overview of single-cell RNA-seq pseudotime estimation algorithms, comprehensive collection of links to software and accompanying papers, by Anthony Gitter

Courses

  • Analysis of single cell RNA-seq data, www.singlecellcourse.org - step-by-step scRNA-seq analysis course. R-based, with code examples, explanations, exercises. From alignment (STAR) and QC (FASTQC) to introduction to R, SingleCellExperiment class, scater object, data exploration (reads, UMI), filtering, normalization (scran), batch effect removal (RUV, ComBat, mnnCorrect, GLM, Harmony), clustering and marker gene identification (SINCERA, SC3, tSNE, Seurat), feature selection (M3Drop::M3DropConvertData, BrenneckeGetVariableGenes), pseudotime analysis (TSCAN, Monocle, diffusion maps, SLICER, Ouija, destiny), imputation (scImpute, DrImpute, MAGIC), differential expression (Kolmogorov-Smirnov, Wilcoxon, edgeR, Monocle, MAST), data integration (scmap, cell-to-cell mapping, Metaneighbour, mnnCorrect, Seurat's canonical correllation analysis). Search for scRNA-seq data (scfind R package), as well as Hemberg group’s public datasets. Seurat chapter. "Ideal" scRNA-seq pipeline. Video lectures

    Paper

    Andrews, Tallulah S., Vladimir Yu Kiselev, Davis McCarthy, and Martin Hemberg. “Tutorial: Guidelines for the Computational Analysis of Single-Cell RNA Sequencing Data.” Nature Protocols, December 7, 2020.

Tutorials

Preprocessing pipelines

  • Assessment of 9 preprocessing pipelines (Cell Ranger, Optimus, salmon alevin, kallisto bustools, dropSeqPipe, scPipe, zUMIs, celseq2 and scruff) on 10X and CEL-Seq2 datasets (scmixology and others, 9 datasets total). All pipelines coupled with performant post-processing (normalization, filtering, etc.) produce comparable data quality in terms of clustering/agreement with known cell types. Low-expressed genes are discordant. Details and specific results of each pipeline. GitHub with pre-/postprocessing scripts

    Preprint

    You, Yue, Luyi Tian, Shian Su, Xueyi Dong, Jafar S Jabbari, Peter F Hickey, and Matthew E Ritchie. “Benchmarking UMI-Based Single Cell RNA-Sequencing Preprocessing Workflows.” Preprint. Bioinformatics, June 17, 2021.

  • Single cell current best practices tutorial, GitHub. QC (count depth, number of genes, % mitochondrial), normalization (global, downsampling, nonlinear), data correction (batch, denoising, imputation), feature selection, dimensionality reduction (PCA, diffusion maps, tSNE, UMAP), visualization, clustering (k-means, graph/community detection), annotation, trajectory inference (PAGA, Monocle), differential analysis (DESeq2, EdgeR, MAST), gene regulatory networks. Description of the bigger picture at each step, latest tools, their brief description, references. R-based Scater as the full pipeline for QC and preprocessing, Seurat for downstream analysis, scanpy Python pipeline. Links and refs to other tutorials.

    Paper

    Luecken, Malte D., and Fabian J. Theis. “Current Best Practices in Single-Cell RNA-Seq Analysis: A Tutorial.” Molecular Systems Biology 15, no. 6 (June 19, 2019)

  • Alevin - end-to-end droplet-based scRNA-seq (10X Genomics) processing pipeline performing cell barcode detection (two-step whitelisting procedure), read mapping, UMI deduplication (parsimonious UMI graphs, PUGs), resolving multimapped reads (EM method to resolve UMI collisions), gene count estimation. Intelligently handles UMI deduplication and multimapped reads, resulting in more accurate gene abundance estimation. Input - sample-demultiplexed FASTQ, output - gene-level UMI counts. Compared against the Cell Ranger, dropEst, STAR and featureCount-based pipelines, UMI-tools, alevin is more accurate and quantifies a greater proportion of sequenced data, especially on combined genomes. Approx. 21X faster than Cell Ranger, low memory requirements, 10-12 threads optimal. C++ implementation, part of Salmon. Alevin documentation, Tutorials that include visualization options.

    Paper

    Srivastava, Avi, Laraib Malik, Tom Smith, Ian Sudbery, and Rob Patro. “Alevin Efficiently Estimates Accurate Gene Abundances from DscRNA-Seq Data.” Genome Biology, (December 2019)

  • bigSCale - scalable analytical framework to analyze large scRNA-seq datasets, UMIs or counts. Pre-clustering, convolution into iCells, final clustering, differential expression, biomarkers.Correlation metric for scRNA-seq data based on converting expression to Z-scores of differential expression. Robust to dropouts. Matlab implementation. Data, 1847 human neuronal progenitor cells

    Paper

    Iacono, Giovanni, Elisabetta Mereu, Amy Guillaumet-Adkins, Roser Corominas, Ivon Cuscó, Gustavo Rodríguez-Esteban, Marta Gut, Luis Alberto Pérez-Jurado, Ivo Gut, and Holger Heyn. “BigSCale: An Analytical Framework for Big-Scale Single-Cell Data.” Genome Research 28, no. 6 (June 2018): 878–90.

  • CALISTA - clustering, lineage reconstruction, transition gene identification, and cell pseudotime single cell transcriptional analysis. Analyses can be all or separate. Uses a likelihood-based approach based on probabilistic models of stochastic gene transcriptional bursts and random technical dropout events, so all analyses are compatible with each other. Input - a matrix of normalized, batch-removed log(RPKM) or log(TPM) or scaled UMIs. Methods detail statistical methodology. Matlab and R version

    Paper

    Papili Gao N, Hartmann T, Fang T, Gunawan R. CALISTA: Clustering and LINEAGE Inference in Single-Cell Transcriptional Analysis. Frontiers in bioengineering and biotechnology. 2020 Feb 4;8:18.

  • demuxlet - Introduces the ‘demuxlet’ algorithm, which enables genetic demultiplexing, doublet detection, and super-loading for droplet-based scRNA-seq. Recommended approach when samples have distinct genotypes

    Paper

    Kang, Hyun Min, Meena Subramaniam, Sasha Targ, Michelle Nguyen, Lenka Maliskova, Elizabeth McCarthy, Eunice Wan, et al. “Multiplexed Droplet Single-Cell RNA-Sequencing Using Natural Genetic Variation.” Nature Biotechnology 36, no. 1 (January 2018): 89–94.

  • kallistobus - fast pipeline for scRNA-seq processing. New BUS (Barcode, UMI, Set) format for storing and manipulating pseudoalignment results. Includes RNA velocity analysis. Python-based

    Preprint

    Melsted, Páll, A. Sina Booeshaghi, Fan Gao, Eduardo da Veiga Beltrame, Lambda Lu, Kristján Eldjárn Hjorleifsson, Jase Gehring, and Lior Pachter. “Modular and Efficient Pre-Processing of Single-Cell RNA-Seq.” Preprint. Bioinformatics, June 17, 2019.

  • PyMINEr - Python-based scRNA-seq processing pipeline. Cell type identification, detection of cell type-enriched genes, pathway analysis, co-expression networks and graph theory approaches to interpreting gene expression. Notes on methods: modified K++ clustering, automatic detection of the number of cell types, co-expression and PPI networks. Input: .txt or .hdf5 files. Detailed analysis of several pancreatic datasets

    Paper

    Tyler, Scott R., Pavana G. Rotti, Xingshen Sun, Yaling Yi, Weiliang Xie, Michael C. Winter, Miles J. Flamme-Wiese, et al. “PyMINEr Finds Gene and Autocrine-Paracrine Networks from Human Islet ScRNA-Seq.” Cell Reports 26, no. 7 (February 2019): 1951-1964.e8.

  • SEQC - Single-Cell Sequencing Quality Control and Processing Software, a general purpose method to build a count matrix from single cell sequencing reads, able to process data from inDrop, drop-seq, 10X, and Mars-Seq2 technologies

    Paper

    Azizi, Elham, Ambrose J. Carr, George Plitas, Andrew E. Cornish, Catherine Konopacki, Sandhya Prabhakaran, Juozas Nainys, et al. “Single-Cell Map of Diverse Immune Phenotypes in the Breast Tumor Microenvironment.” Cell, June 2018.

  • zUMIs - scRNA-seq processing pipeline that handles barcodes and summarizes UMIs using exonic or exonic + intronic mapped reads (improves clustering, DE detection). Adaptive downsampling of oversequenced libraries. STAR aligner, Rsubread::featureCounts counting UMIs in exons and introns.

    Paper

    Parekh, Swati, Christoph Ziegenhain, Beate Vieth, Wolfgang Enard, and Ines Hellmann. “ZUMIs - A Fast and Flexible Pipeline to Process RNA Sequencing Data with UMIs.” GigaScience 7, no. 6 (01 2018).

  • STAR alignment parameters: –outFilterType BySJout, –outFilterMultimapNmax 100, –limitOutSJcollapsed 2000000 –alignSJDBoverhangMin 8, –outFilterMismatchNoverLmax 0.04, –alignIntronMin 20, –alignIntronMax 1000000, –readFilesIn fastqrecords, –outSAMprimaryFlag AllBestScore, –outSAMtype BAM Unsorted. From Azizi et al., “Single-Cell Map of Diverse Immune Phenotypes in the Breast Tumor Microenvironment.”

Format conversion

  • sceasy - R package to convert different single-cell data formats to each other, supports Seurat, SingleCellExperiment, AnnData, Loom

  • scKirby - R package for automated ingestion and conversion of various single-cell data formats (SingleCellExperiment, SummarizedExperiment, HDF5SummarizedExperiment, Seurat, H5Seurat, anndata, loom, loomR, CellDataSet/monocle, ExpressionSet, and more).

  • zellkonverter - R package for conversion between scRNA-seq objects (the Bioconductor SingleCellExperiment data structure and the Python AnnData-based single-cell analysis environment). Tweet

Visualization pipelines

Quality control

Normalization

Batch correction, merging

Imputation

Assessment of 18 scRNA-seq imputation methods (model-based, smooth-based, deep learning, matrix decomposition). Similarity of scRNA- and bulk RNA-seq profiles (Spearman), differential expression (MAST and Wilcoxon), clustering (k-means, Louvain), trajectory reconstruction (Monocle 2, TSCAN), didn't test velocity. scran for normalization. Imputation methods improve correlation with bulk RNA-seq, but have minimal effect on downstream analyses. MAGIC, kNN-smoothing, SAVER perform well overall. Plate- and droplet-derived scRNA-seq cell line data, Additional File 4), Summary table of the functionality of all imputation methods, Additional File 5 - Hou, Wenpin, Zhicheng Ji, Hongkai Ji, and Stephanie C. Hicks. “A Systematic Evaluation of Single-Cell RNA-Sequencing Imputation Methods.” Genome Biology 21, no. 1 (December 2020)

Dimensionality reduction

Clustering

Spatial inference

Time, trajectory inference

Networks

  • Benchmarking four single-cell network inference methods on experimental datasets for the same biological conditions. GENIE3, GRNBoost2, PIDC, PPCOR methods, overview of each. GENIE3 (tree-based network inference, for each gene find most predictive genes using regression) appears the most reproducible. GitHub. Other benchmarking studies: Chen and Mar 2018 and Pratapa 2020

  • Benchmarking of 11 scRNA-seq network inference methods. Top performers (PEARSON, PIDC, MERLIN, SCENIC), middle (Inferelator, SCODE, LEAP, Scribe) and bottom (knnDREMI, SILGGM). Simple correlation works well. Imputation did not benefit network inference, Human, mouse, yeast data, using scRNA-seq and bulk data (minimal performance differences). Brief description of methods, gold standard, evaluation metrics

    Preprint

    Stone, Matthew, Sunnie Grace McCalla, Alireza Fotuhi Siahpirani, Viswesh Periyasamy, Junha Shin, and Sushmita Roy. “Identifying Strengths and Weaknesses of Methods for Computational Network Inference from Single Cell RNA-Seq Data.” Preprint. Bioinformatics, June 2, 2021.

RNA velocity

Differential expression

Differential abundance

  • Milo - an R package for differential abundance testing on scRNA-seq data between two groups or multiple conditions. Building a graph on the first 40 components of PCA, defining neighborhoods using a graph sampling algorithm. Each neighborhood (partially overlapping, in contrast to discrete clustering) contains cells from different conditions - differential abundance is tested using a negative binomial GLM. Tested on simulated datasets (dyntoy), a time course of mouse thymic epithelial cells development, liver cirrhosis analysis. Replicated datasets needed, batch corrected. Competitors: DA-seq, Cydar. Code to reproduce results for the paper

CNV

Annotation, subpopulation identification

  • CellTypist - machine learning tool for precise cell type annotation, immune cell types. Trained on 20 tissues with harmonized cell type labels, hierarchy of 45 cell types. L2-regularized logistic regression, machine learning framework, gradient descent, 30 epoch. Scanpy pipeline, batch correction using bbknn, markers detection using rbcde. GitHub

  • tricycle - transfer learning approach to learn cell cycle PCA projections from a reference dataset and project new data on it. Combining the biology of the cell cycle, the mathematical properties of PCA of unimodal periodicity of genes associated with cell cycle. Tweet

  • igrabski/scRNAseq-cell-type - A statistical approach for cell type annotation from scRNA-seq data. Considers all genes, uses a latent variable model to define cell-type-specific barcodes and probabilistically annotates cell type identity, while accounting for batch effects. Methods, modeling gene-specific distribution using off-low, off-high, on states from scRNA expression bimodal distribution. Train the model using reference data from the PanglaoDB https://panglaodb.se/, tested the method on the PBMC, colon, and brain scRNA-seq datasets. Clustering methods tend to overcluster, marker genes are unreliable due to sparsity. Outperforms scmap, CaSTLe, SingleR, Garnett, CellAssign. R code, Tweet

  • Azimuth - Mapping query scRNA-seq dataset to multimodal references and assigning cell types. Supervised principal component analysis to identify a projection of the query dataset that maximally captures the structure defined by the WNN graph. Combined with the anchor-based framework, allows projection on the previously defined reference UMAP visualization. Human PBMC, motor cortex, pancreas, mouse motor cortex references. Online apps

  • MARS - a meta-learning approach for identifying known and new cell types in scRNA-seq data. Constructs a meta-dataset from experiments with annotated cell types (used to learn the cell type landmarks in the embedding space) and an unannotated experiment (mathed to the embedded landmarks). The embedding space and objective function are defined such that cells (annotated and unannotated) embed close to their cell-type landmarks, cell type landmarks are most distinct. Autoencoder with 1000 and 100 neurons, input - all 22.9K genes. Applied to the Tabula Muris Senis dataset, several others. Significantly outperform ScVi, SIMLR, Scanpy and Seurat on adjusted Rand index, adjusted MI and other metrics.

  • Garnett - annotating cells in scRNA-seq data. Hierarchy of cell types and their markers should be pre-defined using a markup language. A classifier is trained to classify additional datasets. Trained on cells from one organisms, can be applied to different organisms. Pre-trained classifiers available. R-based.

  • CellAssign - R package for scRNA-seq cell type inference. Probabilistic graphical model to assign cell type probabilities to single cells using known marker genes (binarized matrix), including "unassigned" categorization. Insensitive to batch- or sample-specific effects. Outperforms Seurat, SC3, PhenoGraph, densityCut, dynamicTreeCut, scmap-cluster, correlation-based methods, SCINA. Applied to delineate the composition of the tumor microenvironment. Built using TensorFlow. https://github.com/irrationone/cellassign

  • SingleCellNet - quantitative cell type annotation. Top-scoring pair transformation to match query and reference datasets. Compared with SCMAP, binary cell type classifier based on correlation. Benchmarked on 12 scRNA-seq datasets, provided in the GitHub repo, http://github.com/pcahan1/singleCellNet/. Blog post

  • scPopCorn - subpopulation identification across scRNA-seq experiments. Identifies shared and unique subpopulations. Joint network of two graphs. First, graphs are built for each experiment using co-expression to identify subpopulations. Second, the corresponsence of the identified subpopulations is refined using Google's PageRank algorithm to identify subpopulations. Compared with Seurat alignment + Louvain, mutual nearest neighbor (MNN) method, and MNN + Louvain. Several assessment metrics. Tested on pancreatic, kidney cells, healthy brain and glioblastoma scRNA-seq data. Sankey diagrams showing how subpopulation assignment change. https://github.com/ncbi/scPopCorn/

  • matchSCore2 - classifying cell types based on reference data. https://github.com/elimereu/matchSCore2

  • Single-Cell Signature Explorer - gene signature (~17,000 from MSigDb, KEGG, Reactome) scoring (sum of UMIs in in a gene signature over the total UMIs in a cell) for single cells, and visualization on top of a t-SNE plot. Optional Noise Reduction (Freeman-Tuckey transform to stabilize technical noise). Four consecutive tools (Go language, R/Shiny). Comparison with Seurat's Cell CycleScore module and AUCell from SCENIC. Very fast. https://sites.google.com/site/fredsoftwares/products/single-cell-signature-explorer

  • SingleR - scRNA-seq cell type assignment by (Spearman) correlating to reference bulk RNA-seq data of pure cell types. Validated on ImmGen data. The package provides Human Primary Cell Atlas data, Blueprint and ENCODE consortium data, ImmGen, three others as a reference. Post-Seurat analysis. Web tool, that takes SingleR objects, instructions are on GitHub, https://github.com/dviraran/SingleR/. Example analysis, Bioconductor package, Twitter

  • TooManyCells - divisive hierarchical spectral clustering of scRNA-seq data. Uses truncated singular vector decomposition to bipartition the cells. Newman-Girvain modularity Q to assess whether bipartition is significant or should be stopped. BirchBeer visualization. Outperforms Phenograph, Seurat, Cellranger, Monocle, the latter is second in performance. Excels for rare populations. Normalization marginally affects performance. https://github.com/GregorySchwartz/tooManyCellsR

  • VISION - functional annotation of scRNA-seq data using gene signatures (Geary's C statistics), unsupervised and supervised. Operates downstream of dimensionality reduction, clustering. A continuation of FastProject. https://github.com/YosefLab/VISION

Cell markers

Phylogenetic inference

  • OncoNEM (oncogenetic nested effects model) - tumor evolution inference from single cell data from somatic SNPs of single cells. Identifies homogeneous subpopulations and infers their genotypes and phylogenetic tree. Probabilistically accounts for noise in the observed genotypes, allele dropouts, unobserved subpopulations. Input - binary genotype matrix, false positive and negative rates. Output - inferred tumor subpopulations, evolutionary tree, posterior probabilities of mutations. Assessed in simulation studies, outperforms similar methods. Robust to the selection of parameters.

Immuno-analysis

Simulation

  • scDesign - scRNA-seq data simulator and statistical framework to access experimental design for differential gene expression analysis. Gamma-Normal mixture model better fits scRNA-seq data, accounts for dropout events (Methods describe step-wise statistical derivations). Single- or double-batch sequencing scenarios. Comparable or superior performance to simulation methods splat, powsimR, scDD, Lun et al. method. DE tested using t-test. Applications include DE methods evaluation, dimensionality reduction testing. https://github.com/Vivianstats/scDesign

  • Splatter - scRNA-seq simulator and pre-defined differential expression. 6 methods, description of each. Issues with scRNA-seq data - dropouts, zero inflation, proportion of zeros, batch effect. Negative binomial for simulation. No simulation is perfect. https://github.com/Oshlack/splatter

Power

  • SCOPIT - Shiny app for estimating the number of cells that must be sequenced to observe cell types in a single-cell sequencing experiment. By Alexander Davis

  • How many cells do we need to sample so that we see at least n cells of each type. By Satija's lab.

  • scPower - an R package for power calculation for single-cell RNA-seq studies. Estimates power of differential expression and eQTLs using zero-inflated negative binomial distribution. Also, power to detect rare cell types. Figure 1 shows the dependence among experimental design parameters. Tested on several datasets, generalizes well across technologies. GitHub and Shiny app

  • powsimR - an R package for simulating scRNA-seq datasets and assess performance of differential analysis methods. Supports Poisson, Negative Binomial, and zero inflated NB, or estimates parameters from user-provided data. Simulates differential expression with pre-defined fold changes, estimates power, TPR, FDR, sample size, and for the user-provided dataset. https://github.com/bvieth/powsimR

Benchmarking

Deep learning

  • Solo - semi-supervised deep learning for doublet identification. Variational autoencoder (scVI) followed by a classifier to detect doublets. Compared with Scrubled and DoubletFinder, improves area under the precision-recall curve.

  • scover - de novo identification of regulatory motifs and their cell type-specific importance from scRNA-seq or scATAC-seq data. Shallow convolutional neural network on one-hot encoded sequence data, k-fold training and selecting most optimal network, extracting motifs from convolutional filters, cluster them, matching with motifs, associating with peak strength/gene expression. application for human kidney scRNA-seq data, Tabula Muris, mouse cerebral cortex SNARE-seq data. Docs, Tweet

  • SAVER-X - denoising scRNA-seq data using deep autoencoder with a Bayesian model. Decomposes the variation into three components: 1) predictable, 2) unpredictable, 3) technical noise. Pretrained on the Human Cell Atlas project, 10X Genomics immune cells, allows for human-mouse cross-species learning. Improves clustering and the detection of differential genes. Outperforms downsampling, MAGIC, DCA, scImpute.

  • scVI - low-dimensional representation of scRNA-seq data used for batch correction, imputation, clustering, differential expression. Deep neural networks to approximate the distribution that underlie observed expression values. Zero-inflated negative binomial distribution conditioned on the batch annotation and unobserved random variables. Compared with DCA, ZINB-WAVE on simulated and real large and small datasets. Perspective by Way & Greene https://github.com/YosefLab/scVI

Spatial transcriptomics

Technology

  • Collections of library structure and sequence of popular single cell genomic methods from Sarah Teichmann's group. GitHub

  • Drop-seq technology - single cells encapsulated in lipid droplets with nanoparticles with cell- and UMI barcodes. Barcoding strategy, "split-and-pool" synthesis cycles to synthesize 12bp cell barcodes, then 8bp UMI synthesis (Figure 1). Majority of droplets are empty, doublets depend on initial cell concentration. Example on a mixture of 589 human HEK and 412 mouse 3T3 cells. Expression profiles from 49,300 retinal cells profiled using Drop-seq. 13,155 largest libraries, reduce dimensionality by PCA to 32 components (decided by permutation), tSNE for visualization. 39 clusters matched to known cell types. GEO GSE63473

10X Genomics

10X QC

Data

Human

Cancer

  • CancerSEA - cancer scRNA-seq studies. Download individual studies, as well as gene signatures (from Angiogenesis, DNA damage to EMT, metastasis, etc.)

  • scTIME Portal - a database and an exploration/analysis portal for single cell transcriptomes of tumor immune microenvironment. Cell clusters, expression of selected genes, data/image download. Links to other portals/databases.

  • Multi-omics single-cell analysis of breast cancer. >130K scRNA-seq across 11 ER+, 5 HER2+ and 10 TNBC primary breast tumors. Immunophenotyping by CIRE-seq. 10X Visium Spatial transcriptomics. SCSubtype signatures - subtype classification (Basal, Her2E, LumA, LumB), Supplementary Table 4. Recurrent gene modules (GMs) driving neoplastic cell heterogeneity. Supplementary Table 5 - gene lists for 7 GMs. DScore (BIRC5, CCNB1, CDC20, NUF2, CEP55, NDC80, MKI67, PTTG1, RRM2, TYMS and UBE2C) and proliferation score. The cytotoxic gene list containing effector cytotoxic proteins (GZMA, GZMB, GZMH, GZMK, GZMM, GNLY, PRF1 and FASLG) and cytotoxic T cell activation markers (IFNG, TNF, IL2R and IL2). Bioinformatics methods: inferCNV, Stereoscope, CIBERSORTx, Monocle 2, CITE-seq-Count, DWLS. Code for all tools on GitHub. Supplementary Tables, Processed data, GEO GSE176078, Spatially resolved transcriptomics data

  • scRNA-seq of healthy breast, GEO GSE164898. Five freshly collected samples, 18K cells, 20K genes. 13 epithelial cell clusters. Breast cancers may originate from 3 luminal mature and 1 progenitor subclusters. TBX3 and PDK4 subclassify ER+ breast cancers in at least four subtypes. Table S2 - cluster-specific gene expression. Matrices in HDF5 format

  • Interactive mammary cell gene expression atlas - Integrated 50K mouse and 24K human mammary epithelial cell atlases, scRNA-seq. Consensus lineage trajectory - embryonic stem cells differentiate into three epithelial lineages (Basal, luminal hormone-sensing L-Hor, luminal alveolar L-Alv). Integration of four public and one new datasets. Harmony, LIGER, scALIGN for integration. STREAM for lineage tracing. ssGSVA for gene set enrichment. Supplementary Data: Supplementary Data 4 - mouse gene signatures of MaSC (mammary stem cells), Basal, LA-Pro (Luminal Alveloar progenitors), L-Alv, LH-Pro (Luminal Hormone-sensing), L-Hor. Supplementary Data 10 - mouse/human-specific and common stem/basal/Alv/Hor lineage genes.

  • scRNA-seq of breast cancer. >340,000 cells, normal breast, preneoplastic tissue, the major breast cancer subtypes, and pairs of tumors and involved lymph nodes. 34 treatment-naive primary tumors. Transition from preneoplastic to tumor involves immune microenvironment shift. ER+ tumors are different. 10X Genomics, hg39 alignment with CellRanger, integration with Seurat, classification by cell cycle markers, PAM50, immune cell-specific and other signatures. edgeR::read10X to read files, TMM normalization, limma-voom and TREAT for differential analysis. InferCNV to infer CNVs from scRNA-seq data. GEO GSE161529 - scRNA-seq data in MatrixMarket format (edgeR::read10X), 69 samples. GEO GSE161892 - Bulk RNA-seq (luminal progenitors (LP), mature luminal (M), basal, stromal populations)

  • scRNA-seq of breast cancer, four women, human mammary epithelial cells. Marker-free algorithm (LandSCENT) that identifies stem-like bipotent state, characterized by YBX1 and ENO1, two modulators of breast cancer risk. Source data 6B - 12- and 72 gene signature of bipotent state, basal-like. GEO GSE113197 - scRNA-seq data, annotated.

  • scRNA-seq of >25K normal human breast epithelial cells from seven individuals. Three cell populations, one basal and two luminal (secretory L1 and hormone-responsive L2). Within luminal L1, three cell states (milk production, secretory, epithelial keratin expression), but the combined analysis reports L1_1 and L1_2 signatures. Fluidigm, 10X Genomics, Seurat, Monocle analyses. GitHub, GEO GSE113197. Supplementary Data 2 - myeloepithelial gene signature to stratify basal cells into either "Basal" or "Myeloepithelial" grouping; scRNA-seq derived "Basal", "Basal_myoepithelial", "L1_1", "L1_2", "L_2", "Unclassified" signatures; Metabric-derived LumA and LumB signatures

  • scRNA-seq of immune cells in BRCA - continuous activation of T cells, no macrophage polarization. inDrop and 10X platforms. 47,016 CD45+ cells from 8 primary breast carcinomas. 83 clusters, tested by cross-validation. GEO GSE114727, GEO GSE114724, GEO GSE114724

Mouse

Brain

Links

Papers

  • Stuart, Tim, Andrew Butler, Paul Hoffman, Christoph Hafemeister, Efthymia Papalexi, William M. Mauck, Yuhan Hao, Marlon Stoeckius, Peter Smibert, and Rahul Satija. “Comprehensive Integration of Single-Cell Data.” Cell, (June 2019) - Seurat v.3 paper. Integration of multiple scRNA-seq and other single-cell omics (spatial transcriptomics, scATAC-seq, immunophenotyping), including batch correction. Anchors as reference to harmonize multiple datasets. Canonical Correlation Analysis (CCA) coupled with Mutual Nearest Neighborhoors (MNN) to identify shared subpopulations across datasets. CCA to reduce dimensionality, search for MNN in the low-dimensional representation. Shared Nearest Neighbor (SNN) graphs to assess similarity between two cells. Outperforms scmap. Extensive validation on multiple datasets (Human Cell Atlas, STARmap mouse visual cortex spatial transcriptomics. Tabula Muris, 10X Genomics datasets, others in STAR methods). Data normalization, variable feature selection within- and between datasets, anchor identification using CCA (methods), their scoring, batch correction, label transfer, imputation. Methods correspond to details of each Seurat function. Preprocessing of real single-cell data. GitHub with code for the paper

  • The single cell studies database, over 1000 studies. Main database, Tweet by Valentine Svensson

  • scATACdb - list of scATAC-seq studies, Google Sheet by Caleb Lareau

  • Journal club on single-cell multimodal data technology and analysis - Data science seminar led by Levi Waldron

  • Review of single-cell sequencing technologies, individual and combined, technical details of each. Combinatorial indexing. Genomic DNA, methylomes, histone modifications, open chromatin, 3D genomics, proteomics, spatial transcriptomics. Table 1 - multiomics technologies, summary. Areas of application, in cancer and cell atlases. Future development, e.g., single-cell metabolomics.

  • sciRNA-seq - single-cell combinatorial indexing RNA-seq technology and sequencing of C. elegans, ~49,000 cells, 27 cell types. Data and R code to download it at http://atlas.gs.washington.edu/hub/

About

A list of scRNA-seq analysis tools

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%