-
Notifications
You must be signed in to change notification settings - Fork 1
database setup
Nadja Brait edited this page Jul 16, 2024
·
1 revision
detectEVE, by default, provides you with the download and configuration of two diamond databases (Diskspace 27GB
):
- viral protein database: a RVDB database, clustered to 80%
- generic protein database: a UNIREF50 database
./detectEVE --setup-databases [--snake ARGS]
examples:
# only .dmnd files are kept in default folder
./detectEVE --setup-databases
# db download files are kept as well
./detectEVE --setup-databases --snake '--cores 16 --notemp'
If you need databases in a different location you can adjust db_dir
in
config.yaml
to whatever suits your system.
If you prefer to handle downloads manually or use existing files, copy any
file you don't want detectEVE to download automatically into databases/
(or
the respective config.yaml/db_dir
) before running --setup-databases
.
Note though, unless you add --notemp
to the --snake
-arguments, all but the
final diamond-formatted database files will be deleted from databases/
at the
end of the setup phase.
cd databases/
url=https://rvdb-prot.pasteur.fr/ &&
db=$(curl -fs $url | grep -oPm1 'files/U-RVDBv[0-9.]+-prot.fasta.xz')
curl $url/$db -o rvdb100.faa.xz
wget https://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref50/uniref50.fasta.gz
wget https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.FULL.gz
wget https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
tar -xzf taxdump.tar.gz nodes.dmp names.dmp