Skip to content

Commit

Permalink
[ci skip] Remove extra validation set (#80) ddc2f35
Browse files Browse the repository at this point in the history
  • Loading branch information
Vincent-Maladiere committed Nov 27, 2024
1 parent 0b2adfe commit 855d324
Show file tree
Hide file tree
Showing 18 changed files with 68 additions and 71 deletions.
Binary file modified .doctrees/auto_examples/plot_01_survival_analysis.doctree
Binary file not shown.
Binary file not shown.
Binary file modified .doctrees/environment.pickle
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\nX_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2)"
"from sklearn.model_selection import train_test_split\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,7 @@
# %%
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# %%
#
Expand Down
Binary file modified _images/sphx_glr_plot_01_survival_analysis_001.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _images/sphx_glr_plot_01_survival_analysis_002.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _images/sphx_glr_plot_01_survival_analysis_003.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _images/sphx_glr_plot_01_survival_analysis_thumb.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
47 changes: 23 additions & 24 deletions _sources/auto_examples/plot_01_survival_analysis.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -187,14 +187,13 @@ In this dataset, approximately 42% of the data is censored..
.. GENERATED FROM PYTHON SOURCE LINES 49-54
.. GENERATED FROM PYTHON SOURCE LINES 49-53
.. code-block:: Python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
Expand All @@ -203,7 +202,7 @@ In this dataset, approximately 42% of the data is censored..
.. GENERATED FROM PYTHON SOURCE LINES 55-71
.. GENERATED FROM PYTHON SOURCE LINES 54-70
Using SurvivalBoost to estimate the survival function
-----------------------------------------------------
Expand All @@ -222,7 +221,7 @@ SurvivalBoost is a scikit-learn compatible model which expects a covariates data
(or array-like) ``X``, and a target dataframe ``y`` with columns "event" and
"duration". This allows SurvivalBoost to estimate the survival function :math:`S`.

.. GENERATED FROM PYTHON SOURCE LINES 72-78
.. GENERATED FROM PYTHON SOURCE LINES 71-77
.. code-block:: Python
Expand Down Expand Up @@ -649,15 +648,15 @@ SurvivalBoost is a scikit-learn compatible model which expects a covariates data
<br />
<br />

.. GENERATED FROM PYTHON SOURCE LINES 79-84
.. GENERATED FROM PYTHON SOURCE LINES 78-83
SurvivalBoost can then predict the survival function for each patient,
according to some time grid of horizons.
**The time grid is learned during fit but can be passed during prediction**
with the parameter ``times``.
When ``times`` is set to ``None``, the model will used the learned time grid.

.. GENERATED FROM PYTHON SOURCE LINES 85-94
.. GENERATED FROM PYTHON SOURCE LINES 84-93
.. code-block:: Python
Expand All @@ -677,11 +676,11 @@ When ``times`` is set to ``None``, the model will used the learned time grid.
.. GENERATED FROM PYTHON SOURCE LINES 95-96
.. GENERATED FROM PYTHON SOURCE LINES 94-95
Let's plot the estimated survival function for some patients.

.. GENERATED FROM PYTHON SOURCE LINES 96-127
.. GENERATED FROM PYTHON SOURCE LINES 95-126
.. code-block:: Python
Expand Down Expand Up @@ -728,7 +727,7 @@ Let's plot the estimated survival function for some patients.



.. GENERATED FROM PYTHON SOURCE LINES 128-138
.. GENERATED FROM PYTHON SOURCE LINES 127-137
Measuring features impact on predictions
----------------------------------------
Expand All @@ -741,7 +740,7 @@ features to eliminate correlations.
We create a synthetic dataset where age (``x8``) is resampled to reduce
confounder bias.

.. GENERATED FROM PYTHON SOURCE LINES 139-187
.. GENERATED FROM PYTHON SOURCE LINES 138-186
.. code-block:: Python
Expand Down Expand Up @@ -805,15 +804,15 @@ confounder bias.



.. GENERATED FROM PYTHON SOURCE LINES 188-193
.. GENERATED FROM PYTHON SOURCE LINES 187-192
Unsurprisingly, the cumulative incidence of death mostly increases with age.
We can do the same thing with chemotherapy treatement.

Let's create a synthetic dataset where chemotherapy (``x6``)
alternates between 0 and 1.

.. GENERATED FROM PYTHON SOURCE LINES 194-235
.. GENERATED FROM PYTHON SOURCE LINES 193-234
.. code-block:: Python
Expand Down Expand Up @@ -870,7 +869,7 @@ alternates between 0 and 1.



.. GENERATED FROM PYTHON SOURCE LINES 236-304
.. GENERATED FROM PYTHON SOURCE LINES 235-303
People treated with chemotherapy likely have more advanced stages of cancer, which is
reflected by the lower estimated survival function. This serves as a reminder that
Expand Down Expand Up @@ -941,7 +940,7 @@ summarize the Brier score in time:
\mathrm{BS(t)} dt
.. GENERATED FROM PYTHON SOURCE LINES 305-315
.. GENERATED FROM PYTHON SOURCE LINES 304-314
.. code-block:: Python
Expand All @@ -963,17 +962,17 @@ summarize the Brier score in time:

.. code-block:: none
IBS for SurvivalBoost: 0.1382
IBS for SurvivalBoost: 0.1439
.. GENERATED FROM PYTHON SOURCE LINES 316-318
.. GENERATED FROM PYTHON SOURCE LINES 315-317
We can compare this to the Integrated Brier score of a simple Kaplan-Meier estimator,
which doesn't take the patient features into account.

.. GENERATED FROM PYTHON SOURCE LINES 319-339
.. GENERATED FROM PYTHON SOURCE LINES 318-338
.. code-block:: Python
Expand Down Expand Up @@ -1005,16 +1004,16 @@ which doesn't take the patient features into account.

.. code-block:: none
IBS for Kaplan-Meier: 0.1566
IBS for Kaplan-Meier: 0.1653
.. GENERATED FROM PYTHON SOURCE LINES 340-341
.. GENERATED FROM PYTHON SOURCE LINES 339-340
Let's also compute the concordance index for both the Kaplan-Meier and SurvivalBoost.

.. GENERATED FROM PYTHON SOURCE LINES 344-353
.. GENERATED FROM PYTHON SOURCE LINES 343-352
.. code-block:: Python
Expand All @@ -1040,13 +1039,13 @@ Let's also compute the concordance index for both the Kaplan-Meier and SurvivalB
.. GENERATED FROM PYTHON SOURCE LINES 354-357
.. GENERATED FROM PYTHON SOURCE LINES 353-356
0.5 corresponds to random chance, which makes sense as the Kaplan-Meier estimator
doesn't depend on the patient features.


.. GENERATED FROM PYTHON SOURCE LINES 358-365
.. GENERATED FROM PYTHON SOURCE LINES 357-364
.. code-block:: Python
Expand All @@ -1073,7 +1072,7 @@ doesn't depend on the patient features.
.. rst-class:: sphx-glr-timing

**Total running time of the script:** (0 minutes 6.993 seconds)
**Total running time of the script:** (0 minutes 7.376 seconds)


.. _sphx_glr_download_auto_examples_plot_01_survival_analysis.py:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -277,15 +277,15 @@ theoretical CIFs:

.. code-block:: none
Integrated theoretical any event survival curve in 0.662 s
SurvivalBoost fit: 2.690 s
SurvivalBoost prediction: 2.927 s
Integrated theoretical cumulative incidence curve for event 1 in 2.988 s
Aalen-Johansen for event 1 fit in 4.937 s
Integrated theoretical cumulative incidence curve for event 2 in 5.032 s
Aalen-Johansen for event 2 fit in 5.018 s
Integrated theoretical cumulative incidence curve for event 3 in 5.096 s
Aalen-Johansen for event 3 fit in 4.976 s
Integrated theoretical any event survival curve in 0.614 s
SurvivalBoost fit: 2.766 s
SurvivalBoost prediction: 2.911 s
Integrated theoretical cumulative incidence curve for event 1 in 2.971 s
Aalen-Johansen for event 1 fit in 5.112 s
Integrated theoretical cumulative incidence curve for event 2 in 5.210 s
Aalen-Johansen for event 2 fit in 5.024 s
Integrated theoretical cumulative incidence curve for event 3 in 5.102 s
Aalen-Johansen for event 3 fit in 4.967 s
Expand Down Expand Up @@ -328,15 +328,15 @@ of censoring.

.. code-block:: none
Integrated theoretical any event survival curve in 0.591 s
SurvivalBoost fit: 2.705 s
SurvivalBoost prediction: 2.940 s
Integrated theoretical cumulative incidence curve for event 1 in 3.000 s
Aalen-Johansen for event 1 fit in 4.967 s
Integrated theoretical cumulative incidence curve for event 2 in 5.058 s
Aalen-Johansen for event 2 fit in 4.967 s
Integrated theoretical cumulative incidence curve for event 3 in 5.045 s
Aalen-Johansen for event 3 fit in 5.035 s
Integrated theoretical any event survival curve in 0.576 s
SurvivalBoost fit: 2.708 s
SurvivalBoost prediction: 2.914 s
Integrated theoretical cumulative incidence curve for event 1 in 2.974 s
Aalen-Johansen for event 1 fit in 4.936 s
Integrated theoretical cumulative incidence curve for event 2 in 5.027 s
Aalen-Johansen for event 2 fit in 4.917 s
Integrated theoretical cumulative incidence curve for event 3 in 4.995 s
Aalen-Johansen for event 3 fit in 4.988 s
Expand All @@ -360,7 +360,7 @@ the large time horizons:

.. rst-class:: sphx-glr-timing

**Total running time of the script:** (0 minutes 43.187 seconds)
**Total running time of the script:** (0 minutes 43.202 seconds)


.. _sphx_glr_download_auto_examples_plot_02_marginal_cumulative_incidence_estimation.py:
Expand Down
9 changes: 4 additions & 5 deletions auto_examples/plot_01_survival_analysis.html
Original file line number Diff line number Diff line change
Expand Up @@ -506,8 +506,7 @@
</div>
<div class="highlight-Python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">sklearn.model_selection</span> <span class="kn">import</span> <a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn.model_selection.train_test_split" title="sklearn.model_selection.train_test_split" class="sphx-glr-backref-module-sklearn-model_selection sphx-glr-backref-type-py-function"><span class="n">train_test_split</span></a>

<span class="n">X_train</span><span class="p">,</span> <span class="n">X_test</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span> <span class="n">y_test</span> <span class="o">=</span> <a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn.model_selection.train_test_split" title="sklearn.model_selection.train_test_split" class="sphx-glr-backref-module-sklearn-model_selection sphx-glr-backref-type-py-function"><span class="n">train_test_split</span></a><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">test_size</span><span class="o">=</span><span class="mf">0.2</span><span class="p">)</span>
<span class="n">X_train</span><span class="p">,</span> <span class="n">X_val</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span> <span class="n">y_val</span> <span class="o">=</span> <a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn.model_selection.train_test_split" title="sklearn.model_selection.train_test_split" class="sphx-glr-backref-module-sklearn-model_selection sphx-glr-backref-type-py-function"><span class="n">train_test_split</span></a><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span> <span class="n">test_size</span><span class="o">=</span><span class="mf">0.2</span><span class="p">)</span>
<span class="n">X_train</span><span class="p">,</span> <span class="n">X_test</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span> <span class="n">y_test</span> <span class="o">=</span> <a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn.model_selection.train_test_split" title="sklearn.model_selection.train_test_split" class="sphx-glr-backref-module-sklearn-model_selection sphx-glr-backref-type-py-function"><span class="n">train_test_split</span></a><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">test_size</span><span class="o">=</span><span class="mf">0.3</span><span class="p">)</span>
</pre></div>
</div>
<section id="using-survivalboost-to-estimate-the-survival-function">
Expand Down Expand Up @@ -1157,7 +1156,7 @@ <h2>Survival model evaluation<a class="headerlink" href="#survival-model-evaluat
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;IBS for SurvivalBoost: </span><span class="si">{</span><span class="n">ibs_survboost</span><span class="si">:</span><span class="s2">.4f</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
</pre></div>
</div>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>IBS for SurvivalBoost: 0.1382
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>IBS for SurvivalBoost: 0.1439
</pre></div>
</div>
<p>We can compare this to the Integrated Brier score of a simple Kaplan-Meier estimator,
Expand All @@ -1182,7 +1181,7 @@ <h2>Survival model evaluation<a class="headerlink" href="#survival-model-evaluat
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;IBS for Kaplan-Meier: </span><span class="si">{</span><span class="n">ibs_km</span><span class="si">:</span><span class="s2">.4f</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
</pre></div>
</div>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>IBS for Kaplan-Meier: 0.1566
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>IBS for Kaplan-Meier: 0.1653
</pre></div>
</div>
<p>Let’s also compute the concordance index for both the Kaplan-Meier and SurvivalBoost.</p>
Expand Down Expand Up @@ -1212,7 +1211,7 @@ <h2>Survival model evaluation<a class="headerlink" href="#survival-model-evaluat
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Concordance index for SurvivalBoost: 0.67
</pre></div>
</div>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> (0 minutes 6.993 seconds)</p>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> (0 minutes 7.376 seconds)</p>
<div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-auto-examples-plot-01-survival-analysis-py">
<div class="sphx-glr-download sphx-glr-download-jupyter docutils container">
<p><a class="reference download internal" download="" href="../_downloads/a6916f06450964ef8d10eb5f311100d1/plot_01_survival_analysis.ipynb"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Jupyter</span> <span class="pre">notebook:</span> <span class="pre">plot_01_survival_analysis.ipynb</span></code></a></p>
Expand Down
38 changes: 19 additions & 19 deletions auto_examples/plot_02_marginal_cumulative_incidence_estimation.html
Original file line number Diff line number Diff line change
Expand Up @@ -559,15 +559,15 @@ <h2>CIFs estimated on uncensored data<a class="headerlink" href="#cifs-estimated
<span class="p">)</span>
</pre></div>
</div>
<img src="../_images/sphx_glr_plot_02_marginal_cumulative_incidence_estimation_001.png" srcset="../_images/sphx_glr_plot_02_marginal_cumulative_incidence_estimation_001.png" alt="Cause-specific cumulative incidence functions (0.0% censoring), Event 1, Event 2, Event 3" class = "sphx-glr-single-img"/><div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Integrated theoretical any event survival curve in 0.662 s
SurvivalBoost fit: 2.690 s
SurvivalBoost prediction: 2.927 s
Integrated theoretical cumulative incidence curve for event 1 in 2.988 s
Aalen-Johansen for event 1 fit in 4.937 s
Integrated theoretical cumulative incidence curve for event 2 in 5.032 s
Aalen-Johansen for event 2 fit in 5.018 s
Integrated theoretical cumulative incidence curve for event 3 in 5.096 s
Aalen-Johansen for event 3 fit in 4.976 s
<img src="../_images/sphx_glr_plot_02_marginal_cumulative_incidence_estimation_001.png" srcset="../_images/sphx_glr_plot_02_marginal_cumulative_incidence_estimation_001.png" alt="Cause-specific cumulative incidence functions (0.0% censoring), Event 1, Event 2, Event 3" class = "sphx-glr-single-img"/><div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Integrated theoretical any event survival curve in 0.614 s
SurvivalBoost fit: 2.766 s
SurvivalBoost prediction: 2.911 s
Integrated theoretical cumulative incidence curve for event 1 in 2.971 s
Aalen-Johansen for event 1 fit in 5.112 s
Integrated theoretical cumulative incidence curve for event 2 in 5.210 s
Aalen-Johansen for event 2 fit in 5.024 s
Integrated theoretical cumulative incidence curve for event 3 in 5.102 s
Aalen-Johansen for event 3 fit in 4.967 s
</pre></div>
</div>
</section>
Expand All @@ -590,15 +590,15 @@ <h2>CIFs estimated on censored data<a class="headerlink" href="#cifs-estimated-o
<span class="n">plot_cumulative_incidence_functions</span><span class="p">(</span><a href="../generated/hazardous.SurvivalBoost.html#hazardous.SurvivalBoost" title="hazardous.SurvivalBoost" class="sphx-glr-backref-module-hazardous sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class="n">survival_boost</span></a><span class="o">=</span><a href="../generated/hazardous.SurvivalBoost.html#hazardous.SurvivalBoost" title="hazardous.SurvivalBoost" class="sphx-glr-backref-module-hazardous sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class="n">survival_boost</span></a><span class="p">,</span> <span class="n">aj</span><span class="o">=</span><span class="n">aj</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">y_censored</span><span class="p">)</span>
</pre></div>
</div>
<img src="../_images/sphx_glr_plot_02_marginal_cumulative_incidence_estimation_002.png" srcset="../_images/sphx_glr_plot_02_marginal_cumulative_incidence_estimation_002.png" alt="Cause-specific cumulative incidence functions (40.4% censoring), Event 1, Event 2, Event 3" class = "sphx-glr-single-img"/><div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Integrated theoretical any event survival curve in 0.591 s
SurvivalBoost fit: 2.705 s
SurvivalBoost prediction: 2.940 s
Integrated theoretical cumulative incidence curve for event 1 in 3.000 s
Aalen-Johansen for event 1 fit in 4.967 s
Integrated theoretical cumulative incidence curve for event 2 in 5.058 s
Aalen-Johansen for event 2 fit in 4.967 s
Integrated theoretical cumulative incidence curve for event 3 in 5.045 s
Aalen-Johansen for event 3 fit in 5.035 s
<img src="../_images/sphx_glr_plot_02_marginal_cumulative_incidence_estimation_002.png" srcset="../_images/sphx_glr_plot_02_marginal_cumulative_incidence_estimation_002.png" alt="Cause-specific cumulative incidence functions (40.4% censoring), Event 1, Event 2, Event 3" class = "sphx-glr-single-img"/><div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Integrated theoretical any event survival curve in 0.576 s
SurvivalBoost fit: 2.708 s
SurvivalBoost prediction: 2.914 s
Integrated theoretical cumulative incidence curve for event 1 in 2.974 s
Aalen-Johansen for event 1 fit in 4.936 s
Integrated theoretical cumulative incidence curve for event 2 in 5.027 s
Aalen-Johansen for event 2 fit in 4.917 s
Integrated theoretical cumulative incidence curve for event 3 in 4.995 s
Aalen-Johansen for event 3 fit in 4.988 s
</pre></div>
</div>
<p>Note that the Aalen-Johansen estimator is unbiased and empirically recovers
Expand All @@ -613,7 +613,7 @@ <h2>CIFs estimated on censored data<a class="headerlink" href="#cifs-estimated-o
<p>Alternatively, we could try to enable a monotonicity constraint at training
time, however, in practice this often causes a sever over-estimation bias for
the large time horizons:</p>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> (0 minutes 43.187 seconds)</p>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> (0 minutes 43.202 seconds)</p>
<div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-auto-examples-plot-02-marginal-cumulative-incidence-estimation-py">
<div class="sphx-glr-download sphx-glr-download-jupyter docutils container">
<p><a class="reference download internal" download="" href="../_downloads/8da6be5df74b4f584c69dbcd5de4f948/plot_02_marginal_cumulative_incidence_estimation.ipynb"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Jupyter</span> <span class="pre">notebook:</span> <span class="pre">plot_02_marginal_cumulative_incidence_estimation.ipynb</span></code></a></p>
Expand Down
2 changes: 1 addition & 1 deletion searchindex.js

Large diffs are not rendered by default.

0 comments on commit 855d324

Please sign in to comment.