-
Notifications
You must be signed in to change notification settings - Fork 76
Generation of Surrogate Data
The data
provides various strategies for generating surrogate data for permutation testing (see Statistical significance tests).
To create surrogate data, IDTxl's default behavior is to take one of the variables entering the (conditional) mutual information estimation and randomly permute the replications of one of the variables (section Permutation of replications
below). If the number of replications is not sufficient to generate a feasible number of surrogates, the fall-back option is to permute the samples of that variable in time (section Permutation of samples in time
below).
By default, IDTxl tries to generate surrogates by permuting replications of one of the variables used in the estimation. This is done by calling Data.get_realisations()
while setting shuffle=True
. When shuffling replications, blocks of data are permuted while the temporal order of samples stays intact within replications:
Original data:
+--------------+---------+---------+---------+---------+---------+-----+
| repl. ind. | 1 1 1 1 | 2 2 2 2 | 3 3 3 3 | 4 4 4 4 | 5 5 5 5 | ... |
+--------------+---------+---------+---------+---------+---------+-----+
| sample index | 1 2 3 4 | 1 2 3 4 | 1 2 3 4 | 1 2 3 4 | 1 2 3 4 | ... |
+--------------+---------+---------+---------+---------+---------+-----+
Shuffled data:
+--------------+---------+---------+---------+---------+---------+-----+
| repl. ind. | 3 3 3 3 | 1 1 1 1 | 4 4 4 4 | 2 2 2 2 | 5 5 5 5 | ... |
+--------------+---------+---------+---------+---------+---------+-----+
| sample index | 1 2 3 4 | 1 2 3 4 | 1 2 3 4 | 1 2 3 4 | 1 2 3 4 | ... |
+--------------+---------+---------+---------+---------+---------+-----+
If the number of replications is not sufficient to generate the desired number of surrogate data, surrogates are created by shuffling samples in time. Creating surrogates by shuffling samples can also be explicitly requested by the user through setting permute_in_time
to True
when calling any network inference algorithm. Surrogate generation happens in the function Data.permute_samples()
.
Various strategies for permuting samples are implemented and can be defined by setting 'perm_type'
to any of the following options (all options are passed via the settings
dict when calling network inference algorithms):
-
'random'
: shuffle samples at random -
'circular'
: shifts time series by a random number of samples- Set
'max_shift'
to define the maximum number of samples for shifting (e.g., number of samples / 2)
- Set
-
'block'
: swaps blocks of samples,- Set
'block_size'
to define the no. samples per block (e.g., number of samples / 10) - Set
'perm_range'
to define the range in which blocks can be swapped (e.g., number of samples / block_size)
- Set
-
'local'
: swaps samples within a given range- Set
'perm_range'
to define the range in samples over which realisations can be permuted (e.g., number of samples / 10)
- Set
The resulting surrogate data may look like the following:
Original data:
+--------------+-----------------+-----------------+-----------------+-----+
| repl. ind. | 1 1 1 1 1 1 1 1 | 2 2 2 2 2 2 2 2 | 3 3 3 3 3 3 3 3 | ... |
+--------------+-----------------+-----------------+-----------------+-----+
| sample index | 1 2 3 4 5 6 7 8 | 1 2 3 4 5 6 7 8 | 1 2 3 4 5 6 7 8 | ... |
+--------------+-----------------+-----------------+-----------------+-----+
Circular shift by a random number of samples, e.g. 4 samples:
+--------------+-----------------+-----------------+-----------------+-----+
| repl. ind. | 1 1 1 1 1 1 1 1 | 2 2 2 2 2 2 2 2 | 3 3 3 3 3 3 3 3 | ... |
+--------------+-----------------+-----------------+-----------------+-----+
| sample index | 5 6 7 8 1 2 3 4 | 5 6 7 8 1 2 3 4 | 5 6 7 8 1 2 3 4 | ... |
+--------------+-----------------+-----------------+-----------------+-----+
Permute blocks of 3 samples:
+--------------+-----------------+-----------------+-----------------+-----+
| repl. ind. | 1 1 1 1 1 1 1 1 | 2 2 2 2 2 2 2 2 | 3 3 3 3 3 3 3 3 | ... |
+--------------+-----------------+-----------------+-----------------+-----+
| sample index | 4 5 6 7 8 1 2 3 | 4 5 6 7 8 1 2 3 | 4 5 6 7 8 1 2 3 | ... |
+--------------+-----------------+-----------------+-----------------+-----+
Permute data locally within a range of 4 samples:
+--------------+-----------------+-----------------+-----------------+-----+
| repl. ind. | 1 1 1 1 1 1 1 1 | 2 2 2 2 2 2 2 2 | 3 3 3 3 3 3 3 3 | ... |
+--------------+-----------------+-----------------+-----------------+-----+
| sample index | 1 2 4 3 8 5 6 7 | 1 2 4 3 8 5 6 7 | 1 2 4 3 8 5 6 7 | ... |
+--------------+-----------------+-----------------+-----------------+-----+
Random permutation:
+--------------+-----------------+-----------------+-----------------+-----+
| repl. ind. | 1 1 1 1 1 1 1 1 | 2 2 2 2 2 2 2 2 | 3 3 3 3 3 3 3 3 | ... |
+--------------+-----------------+-----------------+-----------------+-----+
| sample index | 4 2 5 7 1 3 2 6 | 4 2 5 7 1 3 2 6 | 4 2 5 7 1 3 2 6 | ... |
+--------------+-----------------+-----------------+-----------------+-----+