Runtimes and Benchmarking

Theoretical and practical runtimes

The number of TE calculations scales with O(P² d S) where P is the number of processes, d the average in-degree that is eventually inferred, and S the number of surrogate calculations, assuming that d is independent of network size P. Note that in the worst case of a fully connected network, d=P, which leads to cubic runtimes.

IDTxl's network inference algorithm uses parallelisation to speed up network inference (N is number of time series samples):

For CPU estimators we parallelise over targets, with O(P d S) TE calculations per target for the KSG estimator and O(P d) TE calculations for the Gaussian estimator using analytical surrogates. Each calculation takes O(K N log N) for the KSG estimator and O(N) for the Gaussian estimator (N is number of time series samples).
For GPU estimators we parallelise over targets and surrogates. For a single target we have O(P d) TE calculations including surrogates, assuming that all surrogates for a source fit into the GPU's main memory and can be processed in parallel. Each calculation on the GPU takes O(N²) time. Note that we additionally parallelise over points N depending on available memory, which leads to faster run times in practice.

For example, for network inference using the CPU and GPU KSG-implementation, we obtain the following runtimes for a single target on one computing node (S=200 surrogates, indegree of d=3, coupled logistic map dynamics, (Novelli, CNS, 2018)):

P	N	Runtime in hours (CPU)	Runtime in hours (GPU)
10	100	0.01	0.005
	1000	0.25	0.05
100	100	0.15	0.03
	1000	5.50	0.45

Hardware used: Intel Xeon CPUs with clock speeds of 2.1-2.6 GHz, NVIDIA GPU, V100 SXM2, 16 GB RAM (maximum runtimes over 10 repetitions).

The pratical runtime for a full network analysis that consideres each process, P, as a target can be parallelised over targets and thus depends on the number of available computing nodes. In the worst case where only a single node is available, the full runtime is the single-target runtime multiplied by P. In the best case where the number of computing nodes is equal to P, the full runtime is equal to the single-target runtime because all targets can be analyzed in parallel.

IDTxl-Wiki Home

Wiki Home

Full documentation

Theoretical background

Tutorials

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtimes and Benchmarking

Theoretical and practical runtimes

Clone this wiki locally