Pipeline for allele-specific variant calling from RNAseq. This pipeline has been optimized to work with ERCC exRNA samples on BRL-BCM HPC.
- Construction of the diploid genomes requires the following dependencies:
- gnomAD frequency database hg38_gnomad30_genome.txt available at Annovar Main Package
- vcf2diploid
- Crossmap
- The variant calling pipeline requires the following dependencies:
- STAR, gatk, picard, samtools, BEDTools
- Hg38 reference file,reference annotation file and known variants sites files such as 1000G_phase1.snps.high_confidence.hg38.vcf.gz, Mills_and_1000G_gold_standard.indels.hg38.vcf.gz, all available at GATK resource bundle.
- getBedCoveragePerStudy.pbs.txt: generate unioned coverage depth bedfile across ERCC studies
- getSumBedCoveragePerStudy.pbs.txt: utilize unionbedgByStudy.R to sumarize covered regions by study
- maskRefGenome.pbs.txt: mask uncovered genome regions.
- Convert unioned ERCC genome coverage bedgraphs from version hg19 to hg38
ppl_vCall_ref_Hr4Rm47Prc4.pbs: calls extracellularly expressed variants with minimum reference bias.