-
Notifications
You must be signed in to change notification settings - Fork 76
Statistical comparison between sets of networks
IDTxl implements functionality to statistically test two sets of inferred
networks against each other (NetworkComparison()
class). All tests use
permutation statistics and are thus non-parametric. IDTxl provides a dedicated
plotting function to visualize results (plot_network_comparison()
, see also the
corresponding tutorial).
There is also the possibility to compare individual links within a single network.
For a detailed description of the results returned by network comparison, have a
look at the tutorial on the ResultsNetworkComparison()
class.
Note that there is also the possibility to test sets of TE estimates generated in IDTxl against each other using tools outside of the toolbox (for example, for comparison of more than two sets). In this case, make sure that all parameters that potentially affect estimation bias are constant across estimates (subjects, conditions, etc.). Such parameters are number of samples, number of replications, and size of the conditioning set. See our demo script for an example on how to set this up.
Below we describe how to do statistical comparisons when the individual data that can be permuted do not allow for TE estimation in isolation, for example, a single trial from a neuroscience experiment.
There are four possible types of comparisons implemented. For illustration, we will assume a neuroscience setting, where data was collected from a single or multiple participants under two different experimental conditions, A and B.
Depending on the exact experimental setup, one of four possible types of statistical comparison are possible (UO = units of observation):
- between (UO=participants) versus within (UO=replications) design and
- dependent und independent samples
Between versus within here describes whether the experimental variation happened within a single subject or between two groups of subjects. For example, the experimental protocol could require that participant group A viewed only images of houses, while group B viewed images of faces. Each participant views multiple replications of their stimulus. Here, the experimental manipulation (image type) happens between participants and we would want to test mTE networks inferred for participant group A against networks inferred from participant group B. The units we compare against each other represent individual participants.
In contrast, the experiment could be set up such that a single participant views a set of images A (e.g., houses) over multiple replications and then a second set of images B (e.g., faces) over the remaining replications. Here, the experimental manipulation happens within the participant and we would want to test inferred networks for the subset of replications observed under condition A against networks inferred from replications observed under condition B. The units we compare against each other represent individual replications within a single participant.
Dependent versus independent samples describes whether a unit of observation in one condition belongs to a specific unit in the second condition, or if units in the two sets are independent of each other. As an example of dependent samples in our between design, we could wish to match and compare each participant in group A (e.g., patients) with a specific participant in group B (e.g., matched controls). An example of a within design are setups where we compare a baseline activation against a task condition such that a specific baseline replication belongs to a specific task replication.
In summary:
comparison type | stats_type | example |
---|---|---|
within | dependent | base line (A) vs. task (B) |
independent | detect house (A) vs. face (B) | |
between | dependent | patients (A) vs. matched controls (B) |
independent | male (A) vs. female (B) participants |
Whether a within or between test is performed is determined by the
compare_within()
and compare_between()
functions of the
NetworkComparison()
class, while the 'independent'
versus
'dependent'
test types are determined via the settings passed to
the functions.
# Generate example data and perform network inference.
data_a = Data()
data_a.generate_mute_data(100, 5)
data_b = Data()
data_b.generate_mute_data(100, 5)
settings = {
'cmi_estimator': 'JidtKraskovCMI',
'n_perm_max_stat': 50,
'n_perm_min_stat': 50,
'n_perm_omnibus': 200,
'n_perm_max_seq': 50,
'max_lag_target': 5,
'max_lag_sources': 5,
'min_lag_sources': 1,
'permute_in_time': True
}
# Run analysis with different targets to simulate different results for both
# analyses.
network_analysis = MultivariateTE()
res_a = network_analysis.analyse_network(settings, data_a, targets=[0, 1], sources='all')
res_b = network_analysis.analyse_network(settings, data_b, targets=[1, 2], sources='all')
comp = NetworkComparison()
comp_settings = {
'cmi_estimator': 'JidtKraskovCMI',
'n_perm_max_stat': 50,
'n_perm_min_stat': 50,
'n_perm_omnibus': 200,
'n_perm_max_seq': 50,
'alpha_comp': 0.26,
'n_perm_comp': 4,
'tail': 'two'
}
comp_settings['stats_type'] = 'dependent'
comp.compare_within(comp_settings, res_a, res_b, data_a, data_b) # dependent within
comp_settings['stats_type'] = 'independent'
comp.compare_within(comp_settings, res_a, res_b, data_a, data_b) # independent within
Following the example of an within design from the introduction, the data sets
data_a
and data_b
here correspond to replications collected from a single
participant, under two conditions A and B.
# Generate example data and perform network inference.
data_a_0 = Data()
data_a_0.generate_mute_data(100, 5)
data_a_1 = Data()
data_a_1.generate_mute_data(100, 5)
data_b_0 = Data()
data_b_0.generate_mute_data(100, 5)
data_b_1 = Data()
data_b_1.generate_mute_data(100, 5)
settings = {
'cmi_estimator': 'JidtKraskovCMI',
'n_perm_max_stat': 50,
'n_perm_min_stat': 50,
'n_perm_omnibus': 200,
'n_perm_max_seq': 50,
'max_lag_target': 5,
'max_lag_sources': 5,
'min_lag_sources': 1,
'permute_in_time': True
}
# Run analysis with different targets to simulate different results for both
# analyses.
network_analysis = MultivariateTE()
res_a_0 = network_analysis.analyse_network(settings, data_a_0, targets=[0, 1], sources='all')
res_a_1 = network_analysis.analyse_network(settings, data_a_0, targets=[1, 2], sources='all')
res_b_0 = nw_0.analyse_network(settings, data_b_0, targets=[0, 2], sources='all')
res_b_1 = nw_0.analyse_network(settings, data_b_1, targets=[0, 1, 2], sources='all')
comp = NetworkComparison()
comp_settings = {
'cmi_estimator': 'JidtKraskovCMI',
'n_perm_max_stat': 50,
'n_perm_min_stat': 50,
'n_perm_omnibus': 200,
'n_perm_max_seq': 50,
'alpha_comp': 0.26,
'n_perm_comp': 4,
'tail': 'two'
}
comp_settings['stats_type'] = 'dependent'
comp.compare_between( # dependent between
comp_settings,
network_set_a=np.array((res_a_0, res_a_1)),
network_set_b=np.array((res_b_0, res_b_1)),
data_set_a=np.array((data_a_0, data_a_1)),
data_set_b=np.array((data_b_0, data_b_1))
)
comp_settings['stats_type'] = 'independent'
comp.compare_between( # independent between
comp_settings,
network_set_a=np.array((res_a_0, res_a_1)),
network_set_b=np.array((res_b_0, res_b_1)),
data_set_a=np.array((data_a_0, data_a_1)),
data_set_b=np.array((data_b_0, data_b_1))
)
Following the example of a between design from the introduction, the
collections of data sets np.array((data_a_0, data_a_1))
and
np.array((data_b_0, data_b_1))
here correspond to data collected from
individual participants from one of two conditions A and B.
The different types of group comparisons differ in how the surrogate distribution for statistical testing is generated. Next, we describe how the test statistic is determined, before we describe the generation of surrogate distributions in detail.
For all tests, we first determine the difference in the two inferred networks as the statistic to be tested:
- Find the union of links of all involved networks (union network)
- Within: union across both conditions
- Between: union across all units of observation and across conditions
- For each link in the union network, estimate mTE for both conditions
- Within: Estimate mTE values for each link in the union network, once from data collected under condition A and once from data collected under B
- Between: Estimate mTE values for each link in the union network from data collected for each unit of observation, across conditions
- For each link, calculate the difference in raw TE estimates, this is the
test statistic used in the permutation test
- Within: Calculate difference in mTE for each link between the two conditions
- Between: Calculate mean difference in mTE for each link between the two sets of networks
Note that a link denotes the total mTE from a source process to a target, where the source process can have many relevant past variables selected through non-uniform embedding.
Next, we generate a surrogate distribution for each link. Here, the within and between test differ:
- Within:
- Swap replications randomly between the two conditions
- For each link and each condition, estimate TE from swapped data
- For each link, calculate the difference in TE estimates from both conditions, estimated from swapped data
Repeat until the distribution of surrogate values is sufficiently large (e.g., 500 times). Test the difference in mTE values for each individual link against its surrogate distribution.
- Between:
- Swap estimated mTE values between subjects
- For each link, calculate the difference in TE estimates
Repeat until the distribution of surrogate values is sufficiently large (e.g., 500 times). Test the original difference in mTE values against the distribution of differences from swapped data.
Note that generating the surrogate data for between tests is significantly faster because it does not require the re-estimation of the mTE for each surrogate value. Instead, for between tests, we estimate the initial mTE values, permute these estimates and repeatedly calculate the difference between swapped estimates.
Whether the test we perform is a dependent or independent affects the swapping in steps 1.i and 2.i of the generation of surrogate data:
- For a dependent test, swapping happens between the matched UOs only: in the within example, each baseline has a corresponding task realization, i.e., we swap each baseline randomly with its task realization; in the between example, we swap the mTE estimate of a patient with the estimates of the corresponding control participant
- For an independent test, swapping happens randomly between conditions: in the within example replications from conditions A and B are pooled, permuted and split to obtain a random partition of replications; in the between example, we pool, permute, and split mTE estimates randomly.