-
Notifications
You must be signed in to change notification settings - Fork 76
Runtimes and Benchmarking
The number of TE calculations scales with O(P2 d S) where P is the number of processes, d the average in-degree that is eventually inferred, and S the number of surrogate calculations, assuming that d is independent of network size P. Note that in the worst case of a fully connected network, d=P, which leads to cubic runtimes.
IDTxl's network inference algorithm uses parallelisation to speed up network inference (N is number of time series samples):
- For CPU estimators we parallelise over targets, with O(P d S) TE calculations per target for the KSG estimator and O(P d) TE calculations for the Gaussian estimator using analytical surrogates. Each calculation takes O(K N log N) for the KSG estimator and O(N) for the Gaussian estimator (N is number of time series samples).
- For GPU estimators we parallelise over targets and surrogates. For a single target we have O(P d) TE calculations including surrogates, assuming that all surrogates for a source fit into the GPU's main memory and can be processed in parallel. Each calculation on the GPU takes O(N2) time. Note that we additionally parallelise over points N depending on available memory, which leads to faster run times in practice.
For example, for network inference using the CPU and GPU KSG-implementation, we obtain the following runtimes for a single target on one computing node (S=200 surrogates, indegree of d=3, coupled logistic map dynamics, (Novelli, CNS, 2018)):
P | N | Runtime in hours (CPU) | Runtime in hours (GPU) |
---|---|---|---|
10 | 100 | 0.01 | 0.005 |
1000 | 0.25 | 0.05 | |
100 | 100 | 0.15 | 0.03 |
1000 | 5.50 | 0.45 |
Hardware used: Intel Xeon CPUs with clock speeds of 2.1-2.6 GHz, NVIDIA GPU, V100 SXM2, 16 GB RAM (maximum runtimes over 10 repetitions).
The pratical runtime for a full network analysis that consideres each process, P, as a target can be parallelised over targets and thus depends on the number of available computing nodes. In the worst case where only a single node is available, the full runtime is the single-target runtime multiplied by P. In the best case where the number of computing nodes is equal to P, the full runtime is equal to the single-target runtime because all targets can be analyzed in parallel.