Cram files? #15

jjfarrell · 2018-03-13T00:44:44Z

Does spark-bam handle cram files? If so, how does the reference get specified?

ryan-williams · 2018-03-13T19:01:32Z

I went through motions of passing .cram-loading through to hadoop-bam, but haven't tested it! You'd just call sc.loadReads like with a .bam.

IIUC, you'd specify relevant options like reference path the same way you do in hadoop-bam, e.g. as a property on the Hadoop Configuration (i.e. SparkContext.hadoopConfiguration).

Making those properties proper method-params to be more idiomatic Scala would be nice.

Feel free to post the results of trying it!

Alternatively, your application code can decide to call hadoop-bam or spark-bam based on the file's extension 🙁

ryan-williams · 2018-03-17T04:52:29Z

Hey @jjfarrell, I looked through the related posts broadinstitute/gatk#4506 and HadoopGenomics/Hadoop-BAM#196 (comment) and am curious to dig a little bit.

Is there a public .cram file you can point me at? I couldn't tell whether your adni/cram/ADNI_002_S_0413.hg38.realign.bqsr.cram is available anywhere.

jjfarrell · 2018-03-17T13:12:24Z

@ryan-williams That cram one is not available. However, I am working on a getting a cram of a GIAB sample available for testing.

jjfarrell · 2018-03-17T15:45:58Z

@ryan-williams

Here is a publiclly available cram from 1000 genomes. Again I found the Spark GATK v4.0.2.1 job was quite slow processing this cram.

gatk FlagStatSpark --input 1000g/cram/HG00419.alt_bwamem_GRCh38DH.20150917.CHS.high_coverage.cram --reference file:///restricted/projectnb/casa/ref/GRCh38_full_analysis_set_plus_decoy_hla.fa -- --spark-runner SPARK --spark-master yarn

Here are the urls for a cram....

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/data/CHS/HG00419/high_cov_alignment/HG00419.alt_bwamem_GRCh38DH.20150917.CHS.high_coverage.cram
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/data/CHS/HG00419/high_cov_alignment/HG00419.alt_bwamem_GRCh38DH.20150917.CHS.high_coverage.cram.crai
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cram files? #15

Cram files? #15

jjfarrell commented Mar 13, 2018

ryan-williams commented Mar 13, 2018

ryan-williams commented Mar 17, 2018

jjfarrell commented Mar 17, 2018

jjfarrell commented Mar 17, 2018

Cram files? #15

Cram files? #15

Comments

jjfarrell commented Mar 13, 2018

ryan-williams commented Mar 13, 2018

ryan-williams commented Mar 17, 2018

jjfarrell commented Mar 17, 2018

jjfarrell commented Mar 17, 2018