These algorithms are tuned to exploit multiple processors. Running the code with multiple processors will result in a significant performance improvement.
Due to its use of Docker, this code is runnable (and has been tested) on cloud computing platforms such as Google Compute Engine.
To run sbc-ngs from pre-compiled FASTA files of target sequences, run:
bash docker_build.sh
bash docker_run.sh [FULL_PATH_TO_DATA_DIRECTORY] [MIN_SEQ_LEN] [MAX_SEQ_FILES]
(e.g.bash docker_run.sh /Users/username/SequenceGenie/example/fasta 1000 128
The value [MIN_SEQ_LEN]
corresponds to the minimum sequence length of a
data read to be considered in the analysis. Depending upon the length of the
template sequences to be matched to, 1000 is a sensible default.
The value [MAX_SEQ_FILES]
corresponds to the number of data sequence files
to consider in the analysis. Note that a data sequence files (e.g. a fastq or
fasta file) is likely to contain multiple reads. This value corresponds to the
number of files to read, not the number of sequences. Increasing this value
(considering more data) will lead to more reliable results at the expense of
performance.
This uses example data provided here in the example/fasta
directory. The required
format of this directory follows that of example/fasta
. Specifically, the directory
must contain the following subdirectories:
-
data
. This directory contains both the sequence data to be analysed (typically in fastq format, but other formats also supported), along with a filebarcodes.csv
which defines the barcode sequences applied to each sample and the sequence id of each sample. This file requires the headerswell
,known_seq_id
,forward
andreverse
. -
seqs
. This directory contains fasta files of the template sequences against which the sequence data will be aligned. The fasta files must share the same names as the values in theknown_seq_id
column of thebarcodes.csv
file, described above.
(Note: this is the approach typically used in the SYNBIOCHEM centre.)
To run sbc-ngs from target sequences held in JBEI-ICE, run:
bash docker_build.sh
bash docker_run_ice.sh [TARGET_DIRECTORY] [ICE_USERNAME] [ICE_PASSWORD] [MIN_SEQ_LEN] [MAX_SEQ_FILES]
(e.g.bash docker_run_ice.sh /Users/username/SequenceGenie/example/ice [email protected] password 1000 128