[Bug] optimal_kernel_bandwidth yields extremely large kernel widths #461

essink · 2022-03-14T17:17:49Z

Describe the bug
The estimation of the instantaneous_rate with kernel="auto" of a very long spiketrain (~650 s) yielded a very smooth and flat rate. It was tracked down to the estimation of the optimal_kernel_bandwidth (the width of the Gaussian kernel, fixed for the whole duration), which yielded a bandwidth of ~22 s for spiketrain with a rather low firing rate.

After some investigation, it was noticed that the optimal bandwidth strongly depends on the duration of the spiketrain. One reason for this might be the hard-coded number of bins (1000) here:

elephant/elephant/statistics.py

Lines 1576 to 1597 in 67dd3f3

    
           if times is None: 
        
               time = np.max(spiketimes) - np.min(spiketimes) 
        
               isi = np.diff(spiketimes) 
        
               isi = isi[isi > 0].copy() 
        
               dt = np.min(isi) 
        
               times = np.linspace(np.min(spiketimes), 
        
                                   np.max(spiketimes), 
        
                                   min(int(time / dt + 0.5), 
        
                                       1000))  # The 1000 seems somewhat arbitrary 
        
               t = times 
        
           else: 
        
               time = np.max(times) - np.min(times) 
        
               spiketimes = spiketimes[(spiketimes >= np.min(times)) & 
        
                                       (spiketimes <= np.max(times))].copy() 
        
               isi = np.diff(spiketimes) 
        
               isi = isi[isi > 0].copy() 
        
               dt = np.min(isi) 
        
               if dt > np.min(np.diff(times)): 
        
                   t = np.linspace(np.min(times), np.max(times), 
        
                                   min(int(time / dt + 0.5), 1000)) 
        
               else: 
        
                   t = times

Although, the option to give an array of times on which the optimization of the kernel is performed exists, it is not very accessible. (See original publication Shimazaki, H. & Shinomoto, S. Kernel bandwidth optimization in spike rate estimation. J Comput Neurosci 29, 171–182 (2010). for details)

Further notice: The implemented optimal_kernel_bandwidth estimates one fixed kernel bandwidth for the given spiketrain and is not adaptive over time! The originating code https://github.com/cooperlab/AdaptiveKDE also implements the adaptive case and could be included into elephant.

To Reproduce

Get spiketimes
Call elephant.statistics.optimal_kernel_bandwidth(spiketimes)

A minimal example is provided in the attached files:
issue_optimal_kernel_bandwidth.zip

Expected behavior
The order of magnitude should be reasonable and if this is exceeded a warning should be raise.

Environment

OS (e.g., Linux): Ubuntu
How you installed elephant (conda, pip, source): conda/pip
Python version: 3.9.7
neo python package version: 0.10.2
elephant python package version: 0.10.0

The text was updated successfully, but these errors were encountered:

Moritz-Alexander-Kern · 2022-03-15T10:01:15Z

Thank you for reporting this and your effort to contribute to elephant.

I have extended your minimal example to illustrate the issue in connection with elephant.statistics.instantaneous_rate.

import pickle
import quantities as pq
import elephant
import matplotlib.pyplot as plt

# Load example spiketrains
high_firing_rate_st, low_firing_rate_st = pickle.load(
    open('spiketrain_examples.pkl', 'rb'))
# firing rates
print(high_firing_rate_st.annotations)
print(low_firing_rate_st.annotations)

{'mean_firing_rate': array(22.00288951) * 1/s}
{'mean_firing_rate': array(3.85978253) * 1/s}

# Look at kernel bandwidth estimation

# spiketrain = high_firing_rate_st
spiketrain = low_firing_rate_st
spiketimes = spiketrain.times.rescale(pq.s).magnitude

# Choose example spiketrain
kernel_bandwidth = elephant.statistics.optimal_kernel_bandwidth(spiketimes)
print(f'The optimal kernel bandwidth for the full length spiketrain was '
      f'estimated to be {kernel_bandwidth["optw"]} s')

The optimal kernel bandwidth for the full length spiketrain was estimated to be 22.401845783799754 s

# Time slice the spiketrain to a 5 s duration
time_sliced_st = spiketrain.time_slice(t_start=1*pq.s, t_stop=6*pq.s)
spiketimes_sliced = time_sliced_st.times.rescale(pq.s).magnitude

kernel_bandwidth_sliced = elephant.statistics.optimal_kernel_bandwidth(
    spiketimes_sliced)
print(f'The optimal kernel bandwidth for the full length spiketrain was '
      f'estimated to be {kernel_bandwidth_sliced["optw"]} s')

The optimal kernel bandwidth for the full length spiketrain was estimated to be 0.5003558368563431 s

# Look at instantaneous rate


def plot_rate(inst_rate):
    plt.plot(inst_rate.times.simplified.magnitude,
             inst_rate.magnitude, color='orange',
             label='instantaneous rate')
    plt.legend()
    plt.title(f"{inst_rate.annotations['kernel']['type']}: "
              f"sigma={inst_rate.annotations['kernel']['sigma']}, "
              f"sampling_period={int(sampling_period.item())} ms")
    fig = plt.gcf()
    fig.set_size_inches(18.5, 10.5)
    plt.show()


sampling_period = 10 * pq.ms
# use kernel "auto"
rate_auto = elephant.statistics.instantaneous_rate(
    spiketrain,
    sampling_period=sampling_period,
    kernel="auto")

# plot result
plot_rate(rate_auto)

Figure 1:

# calculate rate using kernel bandwidth based on time slice
kernel = elephant.kernels.GaussianKernel(
    sigma=kernel_bandwidth_sliced["optw"] * pq.s)

rate_slice = elephant.statistics.instantaneous_rate(
    spiketrain,
    sampling_period=sampling_period,
    kernel=kernel)

# plot result
plot_rate(rate_slice)

Figure 2:

The two figures above show the difference for large and small kernel bandwidths.
If I understand correctly Figure 2 is the desired result for this case, Figure 1 shows a rate estimate, which is too smooth. (bandwidth too large)

possible solutions

rise a warning to the user when elephant.statistics.optimal_kernel_bandwidth(spiketimes) is used and the resulting bandwidth is too large.
- In order to rise this warning in optimal_kernel_bandwidth, the following question comes to mind:
  Is there a way to estimate if the bandwidth is reasonable ? or would it be a fixed value ?
use the adaptive kernel functionality proposed in PR Enh/adaptive kernel #462.

change the hard coded number of bins (Lines 1576 to 1597)

elephant/elephant/statistics.py

Lines 1576 to 1597 in 67dd3f3

    
           if times is None: 
        
               time = np.max(spiketimes) - np.min(spiketimes) 
        
               isi = np.diff(spiketimes) 
        
               isi = isi[isi > 0].copy() 
        
               dt = np.min(isi) 
        
               times = np.linspace(np.min(spiketimes), 
        
                                   np.max(spiketimes), 
        
                                   min(int(time / dt + 0.5), 
        
                                       1000))  # The 1000 seems somewhat arbitrary 
        
               t = times 
        
           else: 
        
               time = np.max(times) - np.min(times) 
        
               spiketimes = spiketimes[(spiketimes >= np.min(times)) & 
        
                                       (spiketimes <= np.max(times))].copy() 
        
               isi = np.diff(spiketimes) 
        
               isi = isi[isi > 0].copy() 
        
               dt = np.min(isi) 
        
               if dt > np.min(np.diff(times)): 
        
                   t = np.linspace(np.min(times), np.max(times), 
        
                                   min(int(time / dt + 0.5), 1000)) 
        
               else: 
        
                   t = times

I guess further steps should be discussed in connection to PR #462, thanks again for contributing this.

essink · 2022-03-15T10:19:59Z

Hi Moritz,

thanks for having a look at it so quickly! Just to clarify a few points:

What an expected or reasonable bandwidth should be, is a very valid question and I don't have a proper answer to that.
I would consider a 500 ms kernel bandwidth still as quite large.
The PR Enh/adaptive kernel #462 does not solve the issue yet! It was just created while trying to solve it. (The hard-coded number of bins are definitely not ideal, but seem not to cause the main problem here)
This method of estimating the instantaneous rate was, traditionally, applied to the spiketrains of sets of trials, which were short in duration. It might be a limitation of the method itself.

Having said that, I agree that we can move the discussion to PR #462 .

essink mentioned this issue Mar 14, 2022

Enh/adaptive kernel #462

Closed

Moritz-Alexander-Kern added the new functionality New modules, functions label Mar 21, 2022

Moritz-Alexander-Kern linked a pull request Apr 4, 2022 that will close this issue

Enh/adaptive kernel #462

Closed

Moritz-Alexander-Kern added this to the v0.12.0 milestone Apr 14, 2022

Moritz-Alexander-Kern modified the milestones: v0.12.0, v0.13.0 Dec 5, 2022

Moritz-Alexander-Kern modified the milestones: v0.13.0, v0.14.0 Jul 5, 2023

Moritz-Alexander-Kern removed this from the v0.14.0 milestone Jul 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] optimal_kernel_bandwidth yields extremely large kernel widths #461

[Bug] optimal_kernel_bandwidth yields extremely large kernel widths #461

essink commented Mar 14, 2022

Moritz-Alexander-Kern commented Mar 15, 2022

essink commented Mar 15, 2022

[Bug] optimal_kernel_bandwidth yields extremely large kernel widths #461

[Bug] optimal_kernel_bandwidth yields extremely large kernel widths #461

Comments

essink commented Mar 14, 2022

Moritz-Alexander-Kern commented Mar 15, 2022

essink commented Mar 15, 2022