Many-core incoherent dedispersion algorithm in OpenCL, with classes to use them in C++.
- Alessio Sclocco, Joeri van Leeuwen, Henri E. Bal, Rob V. van Nieuwpoort. Real-time dedispersion for fast radio transient surveys, using auto tuning on many-core accelerators. Astronomy and Computing, 2016, 14, 1-7. (print) (preprint) (arxiv)
- Alessio Sclocco, Joeri van Leeuwen, Henri E. Bal, Rob V. van Nieuwpoort. A Real-Time Radio Transient Pipeline for ARTS. 3rd IEEE Global Conference on Signal & Information Processing, December 14-16, 2015, Orlando (Florida), USA. (print) (preprint) (slides)
- Alessio Sclocco, Henri E. Bal, Rob V. van Nieuwpoort. Finding Pulsars in Real-Time. IEEE International Conference on eScience, 31 August - 4 September, 2015, Munich, Germany. (print) (preprint) (slides)
- Alessio Sclocco, Henri E. Bal, Jason Hessels, Joeri van Leeuwen, Rob V. van Nieuwpoort. Auto-Tuning Dedispersion for Many-Core Accelerators. 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS), May 19-23, 2014, Phoenix (Arizona), USA. (print) (preprint)
Set the INSTALL_ROOT
environment variable to the location of the pipeline sourcode.
If this package is installed in $HOME/Code/APERTIF/Dedispersion
this would be:
$ export INSTALL_ROOT=$HOME/Code/APERTIF
Then build and test as follows:
$ make install
If AstroData is compiled with PSRDADA support, please set the PSRDADA
environment variable to the psrdada build directory
The dedispersion step is typically compiled as part of a larger pipeline, but this repo contains two example programs in the bin/
directory to test and autotune a dedispersion kernel.
Checks if the output of the CPU is the same for the GPU. The CPU is assumed to be always correct. Needs platform, data layout, and kernel configuration parameters (see below).
Tune the dedispersion kernel's parameters by doing a complete sampling of the parameter space. Kernel configuration and runtime statistics are written to stdout. The commandline parameters are as above, except for the kernel configuration parameters. Needs platform, data layout, and tuning parameters (see below).
The output can be analyzed using the python scripts in in the analysis directory.
Description of common commandline arguments for the separate binaries.
- opencl_platform OpenCL platform
- opencl_device OpenCL device number
- input_bits number of bits used to represent a single input item
- padding cacheline size, in bytes, of the OpenCL device
- vector vector size, in number of input items, of the OpenCL device
-
channels Number of channels
-
min_freq Frequency of first channel
-
channel_bandwidth Mhz
-
samples Number of samples in a batch, ie. length of time dimension; should be divisible by threads0, items0, and threads0 x items0
-
dms Number of dispersion measures, ie. length of dm dimension; should be divisible by threads1, items1, and threads1 x items1
-
dm_first Dispersion measure [parsec/cc]
-
dm_step Dispersion measure step size [parsec/cc]
-
zapped_channels File containing tainted channels, or empty file
-
split-seconds Optional. Sets a different way of treating the input: (not implemented in subband, unclear if it will be useful). Reduces data transfers but slows down computation.
- default mode: data is continuous in memmory
- split-seconds mode: data is blocked in bunches of 1 second
-
local Defines OpenCL memmory space to use; ie. automatic or manual caching.
- global [default]
- local, local is often faster
- threads0 Number of threads in dimension 0 (time)
- threads1 Number of threads in dimension 1 (dm)
- items0 Tiling factor in dimension 0: ie. the number of items per thread
- items1 Tiling factor in dimension 1: ie. the number of items per thread
- unroll How far to unroll loops
- iterations Number of samples for a given configuration.
- min_threads Minimum number of threads to use. Use this to reduce the parameter space.
- max_threads Limits on total number of threads
- max_items Maximum value on item0 + item1
- max_unroll Maximum value unroll parameter
- max_loopsize Some cards have problems with (too) large codes, this limits total kernel size.
- max_columns Limit on length of dimension 0
- max_rows Limit on length of dimension 1
Kernel statistics can be saved to a database, and analyzed to find the optimal configuration.
Install mariadb, fi. via your package manager. Then:
- log in to the database:
$ mysql
- create a database to hold our tuning data:
create database AAALERT
- make sure we can use it (replace USER with your username):
grant all privileges on AAALERT.* to 'USER'@'localhost';
- copy the template configuration file:
cp analysis/config.py.orig analysis/config.py
and enter your configuration.
The analysis scripts use some python3 packages. An easy way to set this up is using virtualenv
:
$ cd $INSTALL_ROOT/Dedispersion/analysis`
$ virtualenv --system-site-packages --python=python3 env`
$ . env/bin/activate`
And then install the missing packages:
$ pip install pymysql
The analysis is controlled by the analysis/dedispersin.py
script.
It prints data as space-separated data to stdout, where you can plot it with fi. gnuplot, or copy-paste it in your favorite spreadsheet.
You can also write it to a file, that can then be read by the dedispersion code.
- List current tables:
./dedispersion.py list
- Create a table:
./dedispersion.py create <table name>
- Enter a file create with DedispersionTuning into the database:
./dedispersion load <table name> <file name>
- Find optimal kernel configuration:
./dedispersion.py tune <table name> max <channels> <samples>
The tune subcommand also takes a number of different parameters: ./dedispersion.py tune <table> <operator> <channels> <samples> [local|cache] [split|cont]
- operator: max, min, avg, std (SQL aggergation commands)
- channels: number of channels
- samples: number of samples
- local|cache When specified, only consider local or cache kernels. See tuning document.
- split|cont When specified, only consider with or without the split_second option. See tuning document.
The code is based on templates, for running the test pipeline we need to define some actual types. This file contains the datatypes used by this package.
Contains getShifts()
that returns for each frequently channel the shift part without the dispersion measure (dm).
Classses holding the implementation of the kernels for CPU and GPU.
Licensed under the Apache License, Version 2.0.