Skip to content

Latest commit

 

History

History
50 lines (40 loc) · 1.92 KB

TUTORIAL.md

File metadata and controls

50 lines (40 loc) · 1.92 KB

Tutorial

As described in the README a small dataset (30 samples by 300 targets) is distributed as part of the XHMM tutorial. Execute the following commands to call CNVs in that dataset with DECA. A video of this workflow running on OSX is available. More detail about the individual steps, parameters, etc. is included in the README.

  1. Install DECA as described in the README:

    git clone [email protected]:bigdatagenomics/deca.git
    

    Assuming that Maven and Spark are already installed (with $SPARK_HOME defined), build DECA with native support.

    cd deca
    export MAVEN_OPTS="-Xmx512m"
    mvn package -P native-lgpl
    

    Add the deca-submit script to PATH and create an environment variable pointing to the DECA jar.

    export PATH=$PATH:$PWD/bin
    export DECA_JAR=$PWD/deca-cli/target/deca-cli_2.11-0.2.1-SNAPSHOT.jar 
    
  2. Download and prepare XHMM tutorial data:

    cd ..
    wget http://atgu.mgh.harvard.edu/xhmm/RUN.zip
    unzip RUN.zip
    cat low_complexity_targets.txt extreme_gc_targets.txt | sort -u > exclude_targets.txt
    
  3. Run DECA on a workstation:

    Note that you will need to set spark.local.dir to a suitable temporary directory for your system and likely need to change the number of cores, executor memory and driver memory to suitable values for your system. The resulting CNVs will be written to DECA.gff3.

    deca-submit \
    --master local[16] \
    --driver-class-path $DECA_JAR \
    --conf spark.local.dir=/path/to/temp/directory \
    --conf spark.driver.maxResultSize=0 \
    --conf spark.kryo.registrationRequired=true \
    --executor-memory 96G --driver-memory 16G \
    -- normalize_and_discover \
    -min_some_quality 29.5 \
    -exclude_targets exclude_targets.txt \
    -I DATA.RD.txt \
    -o DECA.gff3