Skip to content

Latest commit

 

History

History
52 lines (36 loc) · 1.72 KB

README.md

File metadata and controls

52 lines (36 loc) · 1.72 KB

TCGAbiolinks-downloader

This workflow is using the TCGAbiolinks package to download data from the NCI's Genomic Data Commons.

All files are stored as <cohort>.RData in their respecitive analysis directories.

Requirements

The following software is required to run this workflow:

Optionally, the following R packages for post-processing:

  • edgeR - for log2 cpm transformation of RNA-seq reads
  • DESeq2 - for variance stabilizing transformation of RNA-seq reads

Downloading the data

The are three options to download and save TCGA data:

# Download everything
make # add the -j<n> flag to run n data sets in parallel

# Selection by cohort
# - see projects.txt for valid cohorts
make <cohort> # eg. 'TCGA-LUAD' for lung adenocarcinoma

# Selection by data type
# - valid types are: snv_mutect2, rna_seq_raw, cnv_segments, mirna_seq, clinical
make <data type> # eg. 'clinical' for downloading clinical data

Data will be stored as RData files (containing a data.frame or SummarizedExperiment object) for each cohort in the respective data type directories.

Additional documentation

The data processing steps underlying the data being downloaded is fully documented on the GDC webpage.