Skip to content

Commit

Permalink
reorganizing docs (work in progress)
Browse files Browse the repository at this point in the history
  • Loading branch information
wojdyr committed Dec 16, 2024
1 parent 6610157 commit 885be5f
Show file tree
Hide file tree
Showing 11 changed files with 362 additions and 310 deletions.
149 changes: 63 additions & 86 deletions docs/analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -424,6 +424,69 @@ contacts to link definitions from :ref:`monomer library <CCD_etc>`
and to connections (LINK, SSBOND) from the structure.
If you find it useful, please contact the author.

Matthews coefficient
====================

Matthews coefficient V\ :sub:`M` is defined as the crystal volume
per unit of protein molecular weight. Typically, the molecular weight
for V\ :sub:`M` is calculated from a sequence,
and that's what this section is mostly about.

First, let's read a structure and get a protein sequence:

.. doctest::

>>> st = gemmi.read_structure('../tests/5cvz_final.pdb')
>>> st.setup_entities() # it should sort out chain parts
>>> list(st[0])
[<gemmi.Chain A with 141 res>]
>>> # we have just a single chain, which makes this example simpler
>>> chain = st[0]['A']
>>> chain.get_polymer()
<gemmi.ResidueSpan of 141: Axp [17(ALA) 18(ALA) 19(ALA) ... 157(SER)]>
>>> st.get_entity_of(_) # doctest: +ELLIPSIS
<gemmi.Entity 'A' polymer polypeptide(L) object at 0x...>
>>> sequence = _.full_sequence

Gemmi provides a simple function to calculate molecular weight
from the sequence using the built-in table of popular residues:

.. doctest::

>>> weight = gemmi.calculate_sequence_weight(_.full_sequence)
>>> # Now we can calculate Matthews coefficient
>>> st.cell.volume_per_image() / weight
3.1983428753317003

We can continue and calculate the solvent content, assuming the protein
density of 1.35 g/cm\ :sup:`3` (the other constants below are the Avogadro
number and Å\ :sup:`3`/cm\ :sup:`3` = 10\ :sup:`-24`):

.. doctest::

>>> protein_fraction = 1. / (6.02214e23 * 1e-24 * 1.35 * _)
>>> print('Solvent content: {:.1f}%'.format(100 * (1 - protein_fraction)))
Solvent content: 61.5%

If the sequence includes rare chemical components
(outside of the top 300+ most popular components in the PDB), you may
specify the average weight of the components that are not tabulated:

.. doctest::

>>> sequence = ['DSN', 'ALA', 'N2C', 'MVA', 'DSN', 'ALA', 'NCY', 'MVA']
>>> gemmi.calculate_sequence_weight(sequence, unknown=130.0)
784.6114543066407

The weights are assumed to be of unbonded residues. Therefore, the chain weight
is calculated as a sum of all components minus
(*N*--1) × weight of H\ :sub:`2`\ O.

.. note::

Gemmi includes a program that calculates the Matthews coefficient
and the solvent content: :ref:`gemmi-contents <gemmi-contents>`.

Superposition
=============

Expand Down Expand Up @@ -1131,89 +1194,3 @@ where


TBC

.. _pdb_dir:

Local copy of the PDB archive
=============================

Some of the examples in this documentation work with a local copy
of the Protein Data Bank archive. This subsection describes
the assumed setup.

Like in BioJava, we assume that the `$PDB_DIR` environment variable
points to a directory that contains `structures/divided/mmCIF` -- the same
arrangement as on the
`PDB's FTP <ftp://ftp.wwpdb.org/pub/pdb/data/structures/>`_ server.

.. code-block:: console
$ cd $PDB_DIR
$ du -sh structures/*/* # as of Jun 2017
34G structures/divided/mmCIF
25G structures/divided/pdb
101G structures/divided/structure_factors
2.6G structures/obsolete/mmCIF
A traditional way to keep an up-to-date local archive is to rsync it
once a week:

.. code-block:: shell
#!/bin/sh -x
set -u # PDB_DIR must be defined
rsync_subdir() {
mkdir -p "$PDB_DIR/$1"
# Using PDBe (UK) here, can be replaced with RCSB (USA) or PDBj (Japan),
# see https://www.wwpdb.org/download/downloads
rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/pdb/data/$1/ "$PDB_DIR/$1/"
}
rsync_subdir structures/divided/mmCIF
#rsync_subdir structures/obsolete/mmCIF
#rsync_subdir structures/divided/pdb
#rsync_subdir structures/divided/structure_factors
Gemmi has a helper function for using the local archive copy.
It takes a PDB code (case insensitive) and a symbol denoting what file
is requested: P for PDB, M for mmCIF, S for SF-mmCIF.

.. doctest::

>>> os.environ['PDB_DIR'] = '/copy'
>>> gemmi.expand_if_pdb_code('1ABC', 'P') # PDB file
'/copy/structures/divided/pdb/ab/pdb1abc.ent.gz'
>>> gemmi.expand_if_pdb_code('1abc', 'M') # mmCIF file
'/copy/structures/divided/mmCIF/ab/1abc.cif.gz'
>>> gemmi.expand_if_pdb_code('1abc', 'S') # SF-mmCIF file
'/copy/structures/divided/structure_factors/ab/r1abcsf.ent.gz'

If the first argument is not in the PDB code format (4 characters for now)
the function returns the first argument.

.. doctest::

>>> arg = 'file.cif'
>>> gemmi.is_pdb_code(arg)
False
>>> gemmi.expand_if_pdb_code(arg, 'M')
'file.cif'

Multiprocessing
===============

(Python-specific)

Most of the gemmi objects cannot be pickled. Therefore, they cannot be
passed between processes when using the multiprocessing module.
Currently, the only picklable classes (with protocol >= 2) are:
UnitCell and SpaceGroup.

Usually, it is possible to organize multiprocessing in such a way that
gemmi objects are not passed between processes. The example script below
traverses subdirectories and asynchronously analyzes coordinate files,
using 4 worker processes in parallel.

.. literalinclude:: ../examples/multiproc.py
:language: python
:lines: 4-
46 changes: 0 additions & 46 deletions docs/chemistry.rst
Original file line number Diff line number Diff line change
Expand Up @@ -476,49 +476,3 @@ The `logging` argument above is described in the next section.

TBC


.. _logger:

Logger
======

Gemmi Logger is a tiny helper class for passing messages from a gemmi function
to the calling function. It doesn't belong in this section, but it's
documented here because it's used in the previous subsection and I haven't found
a better spot for it.

The messages being passed are usually info or warnings that a command-line
program would print to stdout or stderr.

The Logger has two member variables:

.. literalinclude:: ../include/gemmi/logger.hpp
:language: cpp
:start-at: ///
:end-at: int threshold

and a few member functions for sending messages.

When a function takes a Logger argument, we can pass:

**C++**

* `{&Logger::to_stderr}` to redirect messages to stderr
(to_stderr() calls fprintf),
* `{&Logger::to_stdout}` to redirect messages to stdout,
* `{&Logger::to_stdout, 3}` to print only warnings (threshold=3),
* `{nullptr, 0}` to disable all messages,
* `{}` to throw errors and ignore other messages (the default, see Quirk above),
* `{[](const std::string& s) { do_anything(s);}}` to do anything else.

**Python**

* `sys.stderr` or `sys.stdout` or any other stream (an object with `write`
and `flush` methods), to redirect messages to that stream,
* `(sys.stdout, 3)` to print only warnings (threshold=3),
* `(None, 0)` to disable all messages,
* `None` to throw errors and ignore other messages (the default, see Quirk above),
* a function that takes a message string as its only argument
(e.g. `lambda s: print(s.upper())`).


27 changes: 26 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@
version = _line.split()[2].strip('"')
release = version

exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
# now sure if we'll use headers.rst again, disable it for now
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store', 'headers.rst' ]
pygments_style = 'sphinx'
todo_include_todos = False
highlight_language = 'cpp'
Expand All @@ -43,6 +44,30 @@
html_show_sourcelink = False
html_copy_source = False

def setup(app):
app.connect("builder-inited", monkey_patching_furo)

def monkey_patching_furo(app):
if app.builder.name != 'html':
return

import furo
def _compute_navigation_tree(context: Dict[str, Any]) -> str:
# The navigation tree, generated from the sphinx-provided ToC tree.
if "toctree" in context:
toctree = context["toctree"]
toctree_html = toctree(
collapse=False,
titles_only=False,
maxdepth=2,
includehidden=True,
)
else:
toctree_html = ""
return furo.get_navigation_tree(toctree_html)

furo._compute_navigation_tree = _compute_navigation_tree

# -- Options for LaTeX output ---------------------------------------------

latex_elements = {
Expand Down
5 changes: 5 additions & 0 deletions docs/hkl.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1001,6 +1001,11 @@ program documentation for details.
>>> # and convert it back
>>> cif_string = gemmi.MtzToCif().write_cif_to_string(_)

XDS_ASCII
=========

TODO: document functions from `xds_ascii.hpp`


SX hkl CIF
==========
Expand Down
107 changes: 99 additions & 8 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
.. meta::
:google-site-verification: LsEfb1rjo2RL8WOSZGigV11Kgyhtk9v1Vb-6GZFnHKo

GEMMI - library for structural biology
======================================
Overview
########

Gemmi is a library, accompanied by a set of programs,
developed primarily for use in **macromolecular crystallography** (MX).
What is it for?
===============

Gemmi is a library, accompanied by a :ref:`set of programs <program>`,
developed primarily for use in **structural biology**,
and in particular in **macromolecular crystallography** (MX).
For working with:

* macromolecular models (content of PDB, PDBx/mmCIF and mmJSON files),
Expand Down Expand Up @@ -53,22 +57,109 @@ Source code repository: https://github.com/project-gemmi/gemmi
.. _me: [email protected]

Contents
--------
========

.. toctree::
:maxdepth: 2
:maxdepth: 1

Introduction <self>
Overview <self>
install
program

.. toctree::
:caption: Prerequisites
:maxdepth: 2

cif
symmetry
cell
misc

.. toctree::
:caption: Working with Molecules
:maxdepth: 2

chemistry
mol
analysis

.. toctree::
:caption: Working with Data
:maxdepth: 2

grid
hkl
scattering
program

.. toctree::
:caption: Other Docs

ChangeLog <https://github.com/project-gemmi/gemmi/releases>
Python API reference <https://project-gemmi.github.io/python-api/>
C++ API reference <https://project-gemmi.github.io/cxx-api/>

Credits
=======

This project is using code from a number of third-party open-source projects.

Projects used in the C++ library, included under
`include/gemmi/third_party/` (if used in headers) or `third_party/`:

* `PEGTL <https://github.com/taocpp/PEGTL/>`_ -- library for creating PEG
parsers. License: MIT.
* `sajson <https://github.com/chadaustin/sajson>`_ -- high-performance
JSON parser. License: MIT.
* `PocketFFT <https://gitlab.mpcdf.mpg.de/mtr/pocketfft>`_ -- FFT library.
License: 3-clause BSD.
* `stb_sprintf <https://github.com/nothings/stb>`_ -- locale-independent
snprintf() implementation. License: Public Domain.
* `fast_float <https://github.com/fastfloat/fast_float>`_ -- locale-independent
number parsing. License: Apache 2.0.
* `tinydir <https://github.com/cxong/tinydir>`_ -- directory (filesystem)
reader. License: 2-clause BSD.

Code derived from the following projects is used in the library:

* `ksw2 <https://github.com/lh3/ksw2>`_ -- sequence alignment in
`seqalign.hpp` is based on the ksw_gg function from ksw2. License: MIT.
* `QCProt <https://theobald.brandeis.edu/qcp/>`_ -- superposition method
in `qcp.hpp` is taken from QCProt and adapted to our project. License: BSD.
* `Larch <https://github.com/xraypy/xraylarch>`_ -- calculation of f' and f"
in `fprime.cpp` is based on CromerLiberman code from Larch.
License: 2-clause BSD.

Projects included under `third_party/` that are not used in the library
itself, but are used in command-line utilities, python bindings or tests:

* `zpp serializer <https://github.com/eyalz800/serializer>`_ --
serialization framework. License: MIT.
* `The Lean Mean C++ Option Parser <http://optionparser.sourceforge.net/>`_ --
command-line option parser. License: MIT.
* `doctest <https://github.com/onqtam/doctest>`_ -- testing framework.
License: MIT.
* `linalg.h <http://github.com/sgorsten/linalg/>`_ -- linear algebra library.
License: Public Domain.
* `zlib <https://github.com/madler/zlib>`_ -- a subset of the zlib library
for decompressing gz files, used as a fallback when the zlib library
is not found in the system. License: zlib.

Not distributed with Gemmi:

* `nanobind <https://github.com/wjakob/nanobind>`_ -- used for creating
Python bindings. License: 3-clause BSD.
* `zlib-ng <https://github.com/zlib-ng/zlib-ng>`_ -- optional, can be used
instead of zlib for faster reading of gzipped files.
* `cctbx <https://github.com/cctbx/cctbx_project>`_ -- used in tests
(if cctbx is not present, these tests are skipped) and
in scripts that generated space group data and 2-fold twinning operations.
License: 3-clause BSD.

Mentions:

* `NLOpt <https://github.com/stevengj/nlopt>`_
was used to try out various optimization methods for class Scaling.
License: MIT.

Email me if I forgot about something.

Loading

0 comments on commit 885be5f

Please sign in to comment.