Skip to content

9-427df8a

Latest
Compare
Choose a tag to compare
@martin-steinegger martin-steinegger released this 09 May 17:45
· 285 commits to master since this release
427df8a

At a glance: Foldseek release 9 features the fully benchmarked Foldseek-multimer search and structure-based sequence search using ProstT5. Both Foldseek-multimer and structure-based sequence search are also available in the Foldseek webserver.

Major Features

  • Foldseek-multimer: Fully benchmarked and integrated into this release with the easy-multimersearch and multimer workflows (Thanks @Woosub-Kim). Check out our preprint explaining the algorithm.
    Read more on how to get started in our README.
  • Search requires less memory: We optimized the memory consumption of Foldseek. It requires significant less memory now (f629bbe)
  • Structure-based sequence search: Predict protein 3Di directly from amino acid sequences without the need for existing protein structures. This is roughly 400-4000x faster than predicting full protein structures with ColabFold. This feature uses the ProstT5 protein language model and runs by default on CPU:
foldseek databases ProstT5 weights tmp
foldseek databases PDB pdb tmp
foldseek easy-search QUERY.fasta pdb result.m8 tmp --prostt5-model weights

Fast inference using GPU/CUDA is also supported. Compile from source with cmake -DCMAKE_BUILD_TYPE=Release -DENABLE_CUDA=1 -DCUDAToolkit_ROOT=Path-To-Cuda-Toolkit and call with createdb/easy-search --prostt5-model weights --gpu 1.
(Thanks @Victor-Mihaila).

Breaking changes

  • Remove .cif/.pdb from filenames and remove _MODEL_ from identifiers in .lookup #261 (Thanks @ChaSooyoung)
  • Removed --tar-include and --tar-exclude from createdb as they were unused (15c0516)
  • Not-breaking: workflows using easy-complexsearch and complexsearch will continue to work. These are hidden modules mapping to easy-multimersearch and multimersearch internally. However, the internals have had major changes since the last release.

Other features

  • convert2pdb can output separate PDB files (346c1dd)
  • createdb learned to read a large number of input files from a .tsv file (e1394aa)
  • Force input format with createdb --input-format (852434a)
  • Compute exact TM-score with --exact-tmscore (493cefe)
  • Added CATH50 database (6893dcc)
  • Update HTML output (not fully supported for multimer yet; c7e4a37, 361c22a, 1bc8d2e; Thanks @gamcil)
  • compressca learned new input and output modes (8e68e86, 5d2724d, 284bc81)

Bug Fixes

  • Fix broken symlinks with databases PDB download (9ef6d18, fa6c530).
  • Fix AFDB Proteome and SwissProt download check (fa6c530, Thanks @TigerWindWood)
  • Fix AF3 mmCIF files crashing createdb
  • Fix convert2pdb creating broken PDB files for large structures (b6dac8a)
  • Remove ligand and alt res within chain #198 (Thanks @NatureGeorge)
  • Skip residues without C-alpha #214 (75a50f7)
  • structurerescorediagonal did not properly respect --tmscore-threshold (#205; 886021d)
  • Fallback alignment to Smith-Waterman when block-aligner produces invalid alignments (54c271c)

Developers

  • Foldseek now includes the Candle ML framework and has a further expanded Rust codebase.
  • Foldseek can be inherited from to create subprojects (e00a3dc, 7c2c08e, 9a1a087, 00d2033)