TCGAbiolinks-downloader

This workflow is using the TCGAbiolinks package to download data from the NCI's Genomic Data Commons.

All files are stored as <cohort>.RData in their respecitive analysis directories.

Requirements

The following software is required to run this workflow:

A recent version of R
The TCGAbiolinks package from Bioconductor
GNU make

Optionally, the following R packages for post-processing:

edgeR - for log2 cpm transformation of RNA-seq reads
DESeq2 - for variance stabilizing transformation of RNA-seq reads

Downloading the data

The are three options to download and save TCGA data:

# Download everything
make # add the -j<n> flag to run n data sets in parallel

# Selection by cohort
# - see projects.txt for valid cohorts
make <cohort> # eg. 'TCGA-LUAD' for lung adenocarcinoma

# Selection by data type
# - valid types are: snv_mutect2, rna_seq_raw, cnv_segments, mirna_seq, clinical
make <data type> # eg. 'clinical' for downloading clinical data

Data will be stored as RData files (containing a data.frame or SummarizedExperiment object) for each cohort in the respective data type directories.

Additional documentation

The data processing steps underlying the data being downloaded is fully documented on the GDC webpage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TCGAbiolinks-downloader

Requirements

Downloading the data

Additional documentation

Files

README.md

Latest commit

History

README.md

File metadata and controls

TCGAbiolinks-downloader

Requirements

Downloading the data

Additional documentation