Skip to content

database setup

Nadja Brait edited this page Jul 16, 2024 · 1 revision

Default database setup

detectEVE, by default, provides you with the download and configuration of two diamond databases (Diskspace 27GB):

  • viral protein database: a RVDB database, clustered to 80%
  • generic protein database: a UNIREF50 database
./detectEVE --setup-databases [--snake ARGS]

examples:

# only .dmnd files are kept in default folder
./detectEVE --setup-databases  
# db download files are kept as well
./detectEVE --setup-databases --snake '--cores 16 --notemp'

Advanced database setup

If you need databases in a different location you can adjust db_dir in config.yaml to whatever suits your system.

If you prefer to handle downloads manually or use existing files, copy any file you don't want detectEVE to download automatically into databases/ (or the respective config.yaml/db_dir) before running --setup-databases.

Note though, unless you add --notemp to the --snake-arguments, all but the final diamond-formatted database files will be deleted from databases/ at the end of the setup phase.


cd databases/

Latest RVDB

url=https://rvdb-prot.pasteur.fr/ && 
db=$(curl -fs $url | grep -oPm1 'files/U-RVDBv[0-9.]+-prot.fasta.xz')
curl $url/$db -o rvdb100.faa.xz

UniRef50

wget https://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref50/uniref50.fasta.gz

NCBI taxonomy

wget https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.FULL.gz
wget https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
tar -xzf taxdump.tar.gz nodes.dmp names.dmp