Skip to content

Statistical comparison between sets of networks

Patricia Wollstadt edited this page Feb 11, 2022 · 4 revisions

IDTxl implements functionality to statistically test two sets of inferred networks against each other (NetworkComparison() class). All tests use permutation statistics and are thus non-parametric. IDTxl provides a dedicated plotting function to visualize results (plot_network_comparison(), see also the corresponding tutorial).

There is also the possibility to compare individual links within a single network. For a detailed description of the results returned by network comparison, have a look at the tutorial on the ResultsNetworkComparison() class.

Note that there is also the possibility to test sets of TE estimates generated in IDTxl against each other using tools outside of the toolbox (for example, for comparison of more than two sets). In this case, make sure that all parameters that potentially affect estimation bias are constant across estimates (subjects, conditions, etc.). Such parameters are number of samples, number of replications, and size of the conditioning set. See our demo script for an example on how to set this up.

Below we describe how to do statistical comparisons when the individual data that can be permuted do not allow for TE estimation in isolation, for example, a single trial from a neuroscience experiment.

Introduction

There are four possible types of comparisons implemented. For illustration, we will assume a neuroscience setting, where data was collected from a single or multiple participants under two different experimental conditions, A and B.

Depending on the exact experimental setup, one of four possible types of statistical comparison are possible (UO = units of observation):

  • between (UO=participants) versus within (UO=replications) design and
  • dependent und independent samples

Between versus within here describes whether the experimental variation happened within a single subject or between two groups of subjects. For example, the experimental protocol could require that participant group A viewed only images of houses, while group B viewed images of faces. Each participant views multiple replications of their stimulus. Here, the experimental manipulation (image type) happens between participants and we would want to test mTE networks inferred for participant group A against networks inferred from participant group B. The units we compare against each other represent individual participants.

In contrast, the experiment could be set up such that a single participant views a set of images A (e.g., houses) over multiple replications and then a second set of images B (e.g., faces) over the remaining replications. Here, the experimental manipulation happens within the participant and we would want to test inferred networks for the subset of replications observed under condition A against networks inferred from replications observed under condition B. The units we compare against each other represent individual replications within a single participant.

Dependent versus independent samples describes whether a unit of observation in one condition belongs to a specific unit in the second condition, or if units in the two sets are independent of each other. As an example of dependent samples in our between design, we could wish to match and compare each participant in group A (e.g., patients) with a specific participant in group B (e.g., matched controls). An example of a within design are setups where we compare a baseline activation against a task condition such that a specific baseline replication belongs to a specific task replication.

In summary:

comparison type stats_type example
within dependent base line (A) vs. task (B)
independent detect house (A) vs. face (B)
between dependent patients (A) vs. matched controls (B)
independent male (A) vs. female (B) participants

Calling different types of group comparisons

Whether a within or between test is performed is determined by the compare_within() and compare_between() functions of the NetworkComparison() class, while the 'independent' versus 'dependent' test types are determined via the settings passed to the functions.

Within test

# Generate example data and perform network inference.
data_a = Data()
data_a.generate_mute_data(100, 5)
data_b = Data()
data_b.generate_mute_data(100, 5)
settings = {
    'cmi_estimator': 'JidtKraskovCMI',
    'n_perm_max_stat': 50,
    'n_perm_min_stat': 50,
    'n_perm_omnibus': 200,
    'n_perm_max_seq': 50,
    'max_lag_target': 5,
    'max_lag_sources': 5,
    'min_lag_sources': 1,
    'permute_in_time': True
    }
# Run analysis with different targets to simulate different results for both
# analyses.
network_analysis = MultivariateTE()
res_a = network_analysis.analyse_network(settings, data_a, targets=[0, 1], sources='all')
res_b = network_analysis.analyse_network(settings, data_b,  targets=[1, 2], sources='all')

comp = NetworkComparison()
comp_settings = {
    'cmi_estimator': 'JidtKraskovCMI',
    'n_perm_max_stat': 50,
    'n_perm_min_stat': 50,
    'n_perm_omnibus': 200,
    'n_perm_max_seq': 50,
    'alpha_comp': 0.26,
    'n_perm_comp': 4,
    'tail': 'two'
    }
comp_settings['stats_type'] = 'dependent'
comp.compare_within(comp_settings, res_a, res_b, data_a, data_b)  # dependent within
comp_settings['stats_type'] = 'independent'
comp.compare_within(comp_settings, res_a, res_b, data_a, data_b)  # independent within

Following the example of an within design from the introduction, the data sets data_a and data_b here correspond to replications collected from a single participant, under two conditions A and B.

Between test

# Generate example data and perform network inference.
data_a_0 = Data()
data_a_0.generate_mute_data(100, 5)
data_a_1 = Data()
data_a_1.generate_mute_data(100, 5)
data_b_0 = Data()
data_b_0.generate_mute_data(100, 5)
data_b_1 = Data()
data_b_1.generate_mute_data(100, 5)
settings = {
    'cmi_estimator': 'JidtKraskovCMI',
    'n_perm_max_stat': 50,
    'n_perm_min_stat': 50,
    'n_perm_omnibus': 200,
    'n_perm_max_seq': 50,
    'max_lag_target': 5,
    'max_lag_sources': 5,
    'min_lag_sources': 1,
    'permute_in_time': True
    }
# Run analysis with different targets to simulate different results for both
# analyses.
network_analysis = MultivariateTE()
res_a_0 = network_analysis.analyse_network(settings, data_a_0, targets=[0, 1], sources='all')
res_a_1 = network_analysis.analyse_network(settings, data_a_0, targets=[1, 2], sources='all')
res_b_0 = nw_0.analyse_network(settings, data_b_0, targets=[0, 2], sources='all')
res_b_1 = nw_0.analyse_network(settings, data_b_1, targets=[0, 1, 2], sources='all')

comp = NetworkComparison()
comp_settings = {
    'cmi_estimator': 'JidtKraskovCMI',
    'n_perm_max_stat': 50,
    'n_perm_min_stat': 50,
    'n_perm_omnibus': 200,
    'n_perm_max_seq': 50,
    'alpha_comp': 0.26,
    'n_perm_comp': 4,
    'tail': 'two'
    }

comp_settings['stats_type'] = 'dependent'
comp.compare_between(  # dependent between
    comp_settings,
    network_set_a=np.array((res_a_0, res_a_1)),
    network_set_b=np.array((res_b_0, res_b_1)),
    data_set_a=np.array((data_a_0, data_a_1)),
    data_set_b=np.array((data_b_0, data_b_1))
    )

comp_settings['stats_type'] = 'independent'
comp.compare_between(  # independent between
    comp_settings,
    network_set_a=np.array((res_a_0, res_a_1)),
    network_set_b=np.array((res_b_0, res_b_1)),
    data_set_a=np.array((data_a_0, data_a_1)),
    data_set_b=np.array((data_b_0, data_b_1))
    )

Following the example of a between design from the introduction, the collections of data sets np.array((data_a_0, data_a_1)) and np.array((data_b_0, data_b_1)) here correspond to data collected from individual participants from one of two conditions A and B.

The different types of group comparisons differ in how the surrogate distribution for statistical testing is generated. Next, we describe how the test statistic is determined, before we describe the generation of surrogate distributions in detail.

Determine the test statistics via the union network

For all tests, we first determine the difference in the two inferred networks as the statistic to be tested:

  1. Find the union of links of all involved networks (union network)
    1. Within: union across both conditions
    2. Between: union across all units of observation and across conditions
  2. For each link in the union network, estimate mTE for both conditions
    1. Within: Estimate mTE values for each link in the union network, once from data collected under condition A and once from data collected under B
    2. Between: Estimate mTE values for each link in the union network from data collected for each unit of observation, across conditions
  3. For each link, calculate the difference in raw TE estimates, this is the test statistic used in the permutation test
    1. Within: Calculate difference in mTE for each link between the two conditions
    2. Between: Calculate mean difference in mTE for each link between the two sets of networks

Note that a link denotes the total mTE from a source process to a target, where the source process can have many relevant past variables selected through non-uniform embedding.

Generation of surrogate distributions

Next, we generate a surrogate distribution for each link. Here, the within and between test differ:

  1. Within:
    1. Swap replications randomly between the two conditions
    2. For each link and each condition, estimate TE from swapped data
    3. For each link, calculate the difference in TE estimates from both conditions, estimated from swapped data

Repeat until the distribution of surrogate values is sufficiently large (e.g., 500 times). Test the difference in mTE values for each individual link against its surrogate distribution.

  1. Between:
    1. Swap estimated mTE values between subjects
    2. For each link, calculate the difference in TE estimates

Repeat until the distribution of surrogate values is sufficiently large (e.g., 500 times). Test the original difference in mTE values against the distribution of differences from swapped data.

Note that generating the surrogate data for between tests is significantly faster because it does not require the re-estimation of the mTE for each surrogate value. Instead, for between tests, we estimate the initial mTE values, permute these estimates and repeatedly calculate the difference between swapped estimates.

Dependent versus independent testing

Whether the test we perform is a dependent or independent affects the swapping in steps 1.i and 2.i of the generation of surrogate data:

  • For a dependent test, swapping happens between the matched UOs only: in the within example, each baseline has a corresponding task realization, i.e., we swap each baseline randomly with its task realization; in the between example, we swap the mTE estimate of a patient with the estimates of the corresponding control participant
  • For an independent test, swapping happens randomly between conditions: in the within example replications from conditions A and B are pooled, permuted and split to obtain a random partition of replications; in the between example, we pool, permute, and split mTE estimates randomly.
Clone this wiki locally