martin-steinegger
released this
09 May 17:45
·
285 commits
to master
since this release
At a glance: Foldseek release 9 features the fully benchmarked Foldseek-multimer search and structure-based sequence search using ProstT5. Both Foldseek-multimer and structure-based sequence search are also available in the Foldseek webserver.
Major Features
- Foldseek-multimer: Fully benchmarked and integrated into this release with the
easy-multimersearch
andmultimer
workflows (Thanks @Woosub-Kim). Check out our preprint explaining the algorithm.
Read more on how to get started in our README. - Search requires less memory: We optimized the memory consumption of Foldseek. It requires significant less memory now (f629bbe)
- Structure-based sequence search: Predict protein 3Di directly from amino acid sequences without the need for existing protein structures. This is roughly 400-4000x faster than predicting full protein structures with ColabFold. This feature uses the ProstT5 protein language model and runs by default on CPU:
foldseek databases ProstT5 weights tmp
foldseek databases PDB pdb tmp
foldseek easy-search QUERY.fasta pdb result.m8 tmp --prostt5-model weights
Fast inference using GPU/CUDA is also supported. Compile from source with cmake -DCMAKE_BUILD_TYPE=Release -DENABLE_CUDA=1 -DCUDAToolkit_ROOT=Path-To-Cuda-Toolkit
and call with createdb/easy-search --prostt5-model weights --gpu 1
.
(Thanks @Victor-Mihaila).
Breaking changes
- Remove
.cif
/.pdb
from filenames and remove_MODEL_
from identifiers in.lookup
#261 (Thanks @ChaSooyoung) - Removed
--tar-include
and--tar-exclude
fromcreatedb
as they were unused (15c0516) - Not-breaking: workflows using
easy-complexsearch
andcomplexsearch
will continue to work. These are hidden modules mapping toeasy-multimersearch
andmultimersearch
internally. However, the internals have had major changes since the last release.
Other features
convert2pdb
can output separate PDB files (346c1dd)createdb
learned to read a large number of input files from a.tsv
file (e1394aa)- Force input format with
createdb --input-format
(852434a) - Compute exact TM-score with
--exact-tmscore
(493cefe) - Added CATH50 database (6893dcc)
- Update HTML output (not fully supported for multimer yet; c7e4a37, 361c22a, 1bc8d2e; Thanks @gamcil)
compressca
learned new input and output modes (8e68e86, 5d2724d, 284bc81)
Bug Fixes
- Fix broken symlinks with
databases PDB
download (9ef6d18, fa6c530). - Fix AFDB Proteome and SwissProt download check (fa6c530, Thanks @TigerWindWood)
- Fix AF3 mmCIF files crashing
createdb
- Fix
convert2pdb
creating broken PDB files for large structures (b6dac8a) - Remove ligand and alt res within chain #198 (Thanks @NatureGeorge)
- Skip residues without C-alpha #214 (75a50f7)
structurerescorediagonal
did not properly respect--tmscore-threshold
(#205; 886021d)- Fallback alignment to Smith-Waterman when block-aligner produces invalid alignments (54c271c)