Skip to content

Runtimes and Benchmarking

Patricia Wollstadt edited this page Dec 9, 2018 · 8 revisions

Theoretical and practical runtimes

The number of TE calculations scales with O(P2 d S) where P is the number of processes, d the average in-degree that is eventually inferred, and S the number of surrogate calculations, assuming that d is independent of network size P. Note that in the worst case of a fully connected network, d=P, which leads to cubic runtimes.

IDTxl's network inference algorithm uses parallelisation to speed up network inference (N is number of time series samples):

  • For CPU estimators we parallelise over targets, with O(P d S) TE calculations per target for the KSG estimator and O(P d) TE calculations for the Gaussian estimator using analytical surrogates. Each calculation takes O(K N log N) for the KSG estimator and O(N) for the Gaussian estimator (N is number of time series samples).
  • For GPU estimators we parallelise over targets and surrogates. For a single target we have O(P d) TE calculations including surrogates, assuming that all surrogates for a source fit into the GPU's main memory and can be processed in parallel. Each calculation on the GPU takes O(N2) time. Note that we additionally parallelise over points N depending on available memory, which leads to faster run times in practice.

For example, for network inference using the CPU and GPU KSG-implementation, we obtain the following runtimes for a single target on one computing node (S=200 surrogates, indegree of d=3, coupled logistic map dynamics, (Novelli, CNS, 2018)):

P N Runtime in hours (CPU) Runtime in hours (GPU)
10 100 0.01 0.005
1000 0.25 0.05
100 100 0.15 0.03 
1000 5.50   0.45

Hardware used: Intel Xeon CPUs with clock speeds of 2.1-2.6 GHz, NVIDIA GPU, V100 SXM2, 16 GB RAM (maximum runtimes over 10 repetitions).

The pratical runtime for a full network analysis that consideres each process, P, as a target can be parallelised over targets and thus depends on the number of available computing nodes. In the worst case where only a single node is available, the full runtime is the single-target runtime multiplied by P. In the best case where the number of computing nodes is equal to P, the full runtime is equal to the single-target runtime because all targets can be analyzed in parallel.

Clone this wiki locally