Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parallelization across broadcasted dimensions #376

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 2 additions & 101 deletions docs/celestial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,8 @@ of the package, e.g.::
>>> from reproject import reproject_exact

All functions share a common set of arguments, as well as including
algorithm-dependent arguments. In this section, we take a look at the common
arguments.
algorithm-dependent arguments. In this section, we take a look at some of the
common arguments.

The reprojection functions take two main arguments. The first argument is the
image to reproject, together with WCS information about the image. This can be
Expand Down Expand Up @@ -109,19 +109,6 @@ also be specified, and be given the shape of the output image using the Numpy
:class:`~astropy.io.fits.Header`, does not contain information about image
size).

For the interpolation and adaptive algorithms, an optional third argument,
``roundtrip_coords`` is accepted. By default, after coordinates are transformed
from the output plane to the input plane, the input-plane coordinates are
transformed back to the output plane to ensure that the transformation is
defined in both directions. This doubles the amount of
coordinate-transformation work to be done. In speed-critical situations, where
it is known that the coordinate transformation is defined everywhere, this
extra work can be disabled by setting ``roundtrip_coords=False``. (Note that
this is not a question of whether each output pixel maps to an existing *pixel*
in the input image and vice-versa, but whether it maps to a valid *coordinate*
in the coordinate system of the input image---regardless of whether that
coordinate falls within the bounds of the input image.)

As an example, we start off by opening a FITS file using Astropy::

>>> from astropy.io import fits
Expand Down Expand Up @@ -403,89 +390,3 @@ details).
.. warning:: The :func:`~reproject.reproject_exact` is currently known to
have precision issues for images with resolutions <0.05". For
now it is therefore best to avoid using it with such images.

Very large datasets
===================

If you have a very large dataset to reproject, i.e., any normal IFU or radio
spectral cube with many individual spectral channels - you may not be able to
hold two copies of the dataset in memory. In this case, you can specify an
output memory mapped array to store the data. For now this only works with the
interpolation reprojection methods.

.. doctest-skip::

>>> outhdr = fits.Header.fromtextfile('cube_header_gal.hdr')
>>> shape = (outhdr['NAXIS3'], outhdr['NAXIS2'], outhdr['NAXIS1'])
>>> outarray = np.memmap(filename='output.np', mode='w+', shape=shape, dtype='float32')
>>> hdu = fits.open('cube_file.fits')
>>> rslt = reproject.reproject_interp(hdu, outhdr, output_array=outarray,
... return_footprint=False,
... independent_celestial_slices=True)
>>> newhdu = fits.PrimaryHDU(data=outarray, header=outhdr)
>>> newhdu.writeto('new_cube_file.fits')

Or if you're dealing with FITS files, you can skip the numpy memmap step and use `FITS large file creation
<http://docs.astropy.org/en/stable/generated/examples/io/skip_create-large-fits.html>`_.

.. doctest-skip::

>>> outhdr = fits.Header.fromtextfile('cube_header_gal.hdr')
>>> outhdr.tofile('new_cube.fits')
>>> shape = tuple(outhdr['NAXIS{0}'.format(ii)] for ii in range(1, outhdr['NAXIS']+1))
>>> with open('new_cube.fits', 'rb+') as fobj:
>>> fobj.seek(len(outhdr.tostring()) + (np.product(shape) * np.abs(outhdr['BITPIX']//8)) - 1)
>>> fobj.write(b'\0')
>>> outhdu = fits.open('new_cube.fits', mode='update')
>>> rslt = reproject.reproject_interp(hdu, outhdr, output_array=outhdu[0].data,
... return_footprint=False,
... independent_celestial_slices=True)
>>> outhdu.flush()

Multiple images with the same coordinates
=========================================

If you have multiple images with the exact same coordinate system (e.g. a raw
image and a corresponding processed image) and want to reproject both to the
same output frame, it is faster to compute the coordinate mapping between input
and output pixels only once and re-use this mapping for each reprojection. This
is supported by passing multiple input images as an additional dimension in the
input data array. When the input array contains more dimensions than the input
WCS describes, the extra leading dimensions are taken to represent separate
images with the same coordinates, and the reprojection loops over those
dimensions after computing the pixel mapping. For example:

.. doctest-skip::
>>> raw_image, header_in = fits.getdata('raw_image.fits', header=True)
>>> bg_subtracted_image = fits.getdata('background_subtracted_image.fits')
>>> header_out = # Prepare your desired output projection here
>>> # Combine the two images into one array
>>> image_stack = np.stack((raw_image, bg_subtracted_image))
>>> # We provide a header that describes 2 WCS dimensions, but our input
>>> # array shape is (2, ny, nx)---the 'extra' first dimension represents
>>> # separate images sharing the same coordinates.
>>> reprojected, footprint = reproject.reproject_adaptive(
... (image_stack, header_in), header_out)
>>> # The shape of `reprojected` is (2, ny', nx')
>>> reprojected_raw, reprojected_bg_subtracted = reprojected[0], reprojected[1]

For :func:`~reproject.reproject_interp` and
:func:`~reproject.reproject_adaptive`, this is approximately twice as fast as
reprojecting the two images separately. For :func:`~reproject.reproject_exact`
the savings are much less significant, as producing the coordinate mapping is a
much smaller portion of the total runtime for this algorithm.

When the output coordinates are provided as a WCS and a ``shape_out`` tuple,
``shape_out`` may describe the output shape of a single image, in which case
the extra leading dimensions are prepended automatically, or it may include the
extra dimensions, in which case the size of the extra dimensions must match
those of the input data exactly.

While the reproject functions can accept the name of a FITS file as input, from
which the input data and coordinates are loaded automatically, this multi-image
reprojection feature does not support loading multiple images automatically
from multiple HDUs within one FITS file, as it would be difficult to verify
automatically that the HDUs contain the same exact coordinates. If multiple
HDUs do share coordinates and are to be reprojected together, they must be
separately loaded and combined into a single input array (e.g. using
``np.stack`` as in the above example).
108 changes: 0 additions & 108 deletions docs/dask.rst

This file was deleted.

4 changes: 2 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ You can install *reproject* with `pip <http://www.pip-installer.org/en/latest/>`

or with `conda <https://continuum.io/>`_::

conda install -c astropy reproject
conda install -c conda-forge reproject

More detailed installation instructions can be found in :ref:`installation`.

Expand Down Expand Up @@ -137,7 +137,7 @@ that you want to reproject.
noncelestial
footprints
mosaicking
dask
performance

Reference/API
=============
Expand Down
23 changes: 9 additions & 14 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Installing reproject
Requirements
============

This package has the following hard run time dependencies:
This package has the following dependencies:

* `Python <http://www.python.org/>`__ 3.8 or later

Expand All @@ -19,35 +19,30 @@ This package has the following hard run time dependencies:

* `astropy-healpix <https://astropy-healpix.readthedocs.io>`_ 0.6 or later for HEALPIX image reprojection

and the following optional dependencies:

* `shapely <https://toblerity.org/shapely/project.html>`_ 1.6 or later for some of the mosaicking functionality

and to run the tests, you will also need:
* `dask <https://www.dask.org/>`_ 2021.8 or later

* `Matplotlib <http://matplotlib.org/>`__
* `zarr <https://zarr.readthedocs.io/en/stable/>`_

* `pytest-arraydiff <https://github.com/astrofrog/pytest-fits>`__
* `fsspec <https://filesystem-spec.readthedocs.io/en/latest/>`_

* `pytest-astropy <https://github.com/astropy/pytest-astropy>`__

* `pytest-doctestplus <https://github.com/astropy/pytest-doctestplus>`__
and the following optional dependencies:

* `shapely <https://toblerity.org/shapely/project.html>`_ 1.6 or later for some of the mosaicking functionality

Installation
============

Using pip
---------

To install *reproject* with `pip <http://www.pip-installer.org/en/latest/>`_,
simply run:
To install *reproject* with `pip <https://pip.pypa.io/en/stable/>`_,
run::

pip install reproject

Using conda
-----------

To install *reproject* with `anaconda <https://continuum.io/>`_, simply run::
To install *reproject* with `conda <https://docs.conda.io/en/latest/>`_, run::

conda install -c conda-forge reproject
Loading