Matthias Feurer: Merge pull request #1232 from openml/develop

openml · Mar 22, 2023 · f819812 · f819812
1 parent ace66bb
commit f819812
Show file tree

Hide file tree

Showing 280 changed files with 9,022 additions and 19,378 deletions.
diff --git a/main/.buildinfo b/main/.buildinfo
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: 884c0728f1dea38019eaffe6df15f82c
+config: 977121ba2ad02efffcbb2ee6874bcd8d
 tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/main/_downloads/006bd0fc770c8918e54252890f1e023e/study_tutorial.py b/main/_downloads/006bd0fc770c8918e54252890f1e023e/study_tutorial.py
@@ -51,7 +51,9 @@
 # And we can use the evaluation listing functionality to learn more about
 # the evaluations available for the conducted runs:
 evaluations = openml.evaluations.list_evaluations(
-    function="predictive_accuracy", output_format="dataframe", study=study.study_id,
+    function="predictive_accuracy",
+    output_format="dataframe",
+    study=study.study_id,
 )
 print(evaluations.head())
 

diff --git a/main/_downloads/27f49b0e36fba2fe65360adcf060e098/2015_neurips_feurer_example.ipynb b/main/_downloads/27f49b0e36fba2fe65360adcf060e098/2015_neurips_feurer_example.ipynb
@@ -51,14 +51,14 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "The dataset IDs could be used directly to load the dataset and split the data into a training set\nand a test set. However, to be reproducible, we will first obtain the respective tasks from\nOpenML, which define both the target feature and the train/test split.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>It is discouraged to work directly on datasets and only provide dataset IDs in a paper as\n   this does not allow reproducibility (unclear splitting). Please do not use datasets but the\n   respective tasks as basis for a paper and publish task IDS. This example is only given to\n   showcase the use of OpenML-Python for a published paper and as a warning on how not to do it.\n   Please check the `OpenML documentation of tasks <https://docs.openml.org/#tasks>`_ if you\n   want to learn more about them.</p></div>\n\n"
+        "The dataset IDs could be used directly to load the dataset and split the data into a training set\nand a test set. However, to be reproducible, we will first obtain the respective tasks from\nOpenML, which define both the target feature and the train/test split.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>It is discouraged to work directly on datasets and only provide dataset IDs in a paper as\n   this does not allow reproducibility (unclear splitting). Please do not use datasets but the\n   respective tasks as basis for a paper and publish task IDS. This example is only given to\n   showcase the use of OpenML-Python for a published paper and as a warning on how not to do it.\n   Please check the [OpenML documentation of tasks](https://docs.openml.org/#tasks) if you\n   want to learn more about them.</p></div>\n\n"
       ]
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "This lists both active and inactive tasks (because of ``status='all'``). Unfortunately,\nthis is necessary as some of the datasets contain issues found after the publication and became\ndeactivated, which also deactivated the tasks on them. More information on active or inactive\ndatasets can be found in the `online docs <https://docs.openml.org/#dataset-status>`_.\n\n"
+        "This lists both active and inactive tasks (because of ``status='all'``). Unfortunately,\nthis is necessary as some of the datasets contain issues found after the publication and became\ndeactivated, which also deactivated the tasks on them. More information on active or inactive\ndatasets can be found in the [online docs](https://docs.openml.org/#dataset-status).\n\n"
       ]
     },
     {
@@ -89,7 +89,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.8.10"
+      "version": "3.8.16"
     }
   },
   "nbformat": 4,

diff --git a/main/_downloads/296bc5731c400ca6e06e54ecb9b84b5c/configure_logging.ipynb b/main/_downloads/296bc5731c400ca6e06e54ecb9b84b5c/configure_logging.ipynb
@@ -22,7 +22,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "Openml-python uses the `Python logging module <https://docs.python.org/3/library/logging.html>`_\nto provide users with log messages. Each log message is assigned a level of importance, see\nthe table in Python's logging tutorial\n`here <https://docs.python.org/3/howto/logging.html#when-to-use-logging>`_.\n\nBy default, openml-python will print log messages of level `WARNING` and above to console.\nAll log messages (including `DEBUG` and `INFO`) are also saved in a file, which can be\nfound in your cache directory (see also the\n`sphx_glr_examples_20_basic_introduction_tutorial.py`).\nThese file logs are automatically deleted if needed, and use at most 2MB of space.\n\nIt is possible to configure what log levels to send to console and file.\nWhen downloading a dataset from OpenML, a `DEBUG`-level message is written:\n\n"
+        "Openml-python uses the [Python logging module](https://docs.python.org/3/library/logging.html)\nto provide users with log messages. Each log message is assigned a level of importance, see\nthe table in Python's logging tutorial\n[here](https://docs.python.org/3/howto/logging.html#when-to-use-logging).\n\nBy default, openml-python will print log messages of level `WARNING` and above to console.\nAll log messages (including `DEBUG` and `INFO`) are also saved in a file, which can be\nfound in your cache directory (see also the\n`sphx_glr_examples_20_basic_introduction_tutorial.py`).\nThese file logs are automatically deleted if needed, and use at most 2MB of space.\n\nIt is possible to configure what log levels to send to console and file.\nWhen downloading a dataset from OpenML, a `DEBUG`-level message is written:\n\n"
       ]
     },
     {
@@ -53,7 +53,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.8.10"
+      "version": "3.8.16"
     }
   },
   "nbformat": 4,

diff --git a/main/_downloads/4076733b22158deda2a79e57d217b001/2018_kdd_rijn_example.ipynb b/main/_downloads/4076733b22158deda2a79e57d217b001/2018_kdd_rijn_example.ipynb
@@ -82,7 +82,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.8.10"
+      "version": "3.8.16"
     }
   },
   "nbformat": 4,

diff --git a/main/_downloads/42ecf9b9ca30a385452934aeb1a420d5/2018_neurips_perrone_example.ipynb b/main/_downloads/42ecf9b9ca30a385452934aeb1a420d5/2018_neurips_perrone_example.ipynb
@@ -147,7 +147,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.8.10"
+      "version": "3.8.16"
     }
   },
   "nbformat": 4,

diff --git a/main/_downloads/4e36450a7c3d3fe8c9f4e71689c8e677/plot_svm_hyperparameters_tutorial.ipynb b/main/_downloads/4e36450a7c3d3fe8c9f4e71689c8e677/plot_svm_hyperparameters_tutorial.ipynb
@@ -136,7 +136,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.8.10"
+      "version": "3.8.16"
     }
   },
   "nbformat": 4,

diff --git a/main/_downloads/4ff7bd62b4b1b9012f431cd3c21d497d/study_tutorial.ipynb b/main/_downloads/4ff7bd62b4b1b9012f431cd3c21d497d/study_tutorial.ipynb
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Benchmark studies\nHow to list, download and upload benchmark studies.\nIn contrast to `benchmark suites <https://docs.openml.org/benchmark/#benchmarking-suites>`_ which\nhold a list of tasks, studies hold a list of runs. As runs contain all information on flows and\ntasks, all required information about a study can be retrieved.\n"
+        "\n# Benchmark studies\nHow to list, download and upload benchmark studies.\nIn contrast to [benchmark suites](https://docs.openml.org/benchmark/#benchmarking-suites) which\nhold a list of tasks, studies hold a list of runs. As runs contain all information on flows and\ntasks, all required information about a study can be retrieved.\n"
       ]
     },
     {
@@ -123,7 +123,7 @@
       },
       "outputs": [],
       "source": [
-        "evaluations = openml.evaluations.list_evaluations(\n    function=\"predictive_accuracy\", output_format=\"dataframe\", study=study.study_id,\n)\nprint(evaluations.head())"
+        "evaluations = openml.evaluations.list_evaluations(\n    function=\"predictive_accuracy\",\n    output_format=\"dataframe\",\n    study=study.study_id,\n)\nprint(evaluations.head())"
       ]
     },
     {
@@ -190,7 +190,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.8.10"
+      "version": "3.8.16"
     }
   },
   "nbformat": 4,

diff --git a/main/_downloads/51e951fb65905058a563b0f066ec5771/2018_ida_strang_example.ipynb b/main/_downloads/51e951fb65905058a563b0f066ec5771/2018_ida_strang_example.ipynb
@@ -118,7 +118,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.8.10"
+      "version": "3.8.16"
     }
   },
   "nbformat": 4,

diff --git a/main/_downloads/55fed401604e83ceb83e08221c96c779/fetch_evaluations_tutorial.ipynb b/main/_downloads/55fed401604e83ceb83e08221c96c779/fetch_evaluations_tutorial.ipynb
@@ -98,7 +98,7 @@
       },
       "outputs": [],
       "source": [
-        "from matplotlib import pyplot as plt\n\n\ndef plot_cdf(values, metric=\"predictive_accuracy\"):\n    max_val = max(values)\n    n, bins, patches = plt.hist(values, density=True, histtype=\"step\", cumulative=True, linewidth=3)\n    patches[0].set_xy(patches[0].get_xy()[:-1])\n    plt.xlim(max(0, min(values) - 0.1), 1)\n    plt.title(\"CDF\")\n    plt.xlabel(metric)\n    plt.ylabel(\"Likelihood\")\n    plt.grid(b=True, which=\"major\", linestyle=\"-\")\n    plt.minorticks_on()\n    plt.grid(b=True, which=\"minor\", linestyle=\"--\")\n    plt.axvline(max_val, linestyle=\"--\", color=\"gray\")\n    plt.text(max_val, 0, \"%.3f\" % max_val, fontsize=9)\n    plt.show()\n\n\nplot_cdf(evals.value, metric)\n# This CDF plot shows that for the given task, based on the results of the\n# runs uploaded, it is almost certain to achieve an accuracy above 52%, i.e.,\n# with non-zero probability. While the maximum accuracy seen till now is 96.5%."
+        "from matplotlib import pyplot as plt\n\n\ndef plot_cdf(values, metric=\"predictive_accuracy\"):\n    max_val = max(values)\n    n, bins, patches = plt.hist(values, density=True, histtype=\"step\", cumulative=True, linewidth=3)\n    patches[0].set_xy(patches[0].get_xy()[:-1])\n    plt.xlim(max(0, min(values) - 0.1), 1)\n    plt.title(\"CDF\")\n    plt.xlabel(metric)\n    plt.ylabel(\"Likelihood\")\n    plt.grid(visible=True, which=\"major\", linestyle=\"-\")\n    plt.minorticks_on()\n    plt.grid(visible=True, which=\"minor\", linestyle=\"--\")\n    plt.axvline(max_val, linestyle=\"--\", color=\"gray\")\n    plt.text(max_val, 0, \"%.3f\" % max_val, fontsize=9)\n    plt.show()\n\n\nplot_cdf(evals.value, metric)\n# This CDF plot shows that for the given task, based on the results of the\n# runs uploaded, it is almost certain to achieve an accuracy above 52%, i.e.,\n# with non-zero probability. While the maximum accuracy seen till now is 96.5%."
       ]
     },
     {
@@ -154,7 +154,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.8.10"
+      "version": "3.8.16"
     }
   },
   "nbformat": 4,

diff --git a/main/_downloads/6b1e091fbd3ac8d106b6552c91cf05cc/run_setup_tutorial.py b/main/_downloads/6b1e091fbd3ac8d106b6552c91cf05cc/run_setup_tutorial.py
@@ -57,10 +57,18 @@
 # easy as you want it to be
 
 
-cat_imp = make_pipeline(OneHotEncoder(handle_unknown="ignore", sparse=False), TruncatedSVD(),)
+cat_imp = make_pipeline(
+    OneHotEncoder(handle_unknown="ignore", sparse=False),
+    TruncatedSVD(),
+)
 cont_imp = SimpleImputer(strategy="median")
 ct = ColumnTransformer([("cat", cat_imp, cat), ("cont", cont_imp, cont)])
-model_original = Pipeline(steps=[("transform", ct), ("estimator", RandomForestClassifier()),])
+model_original = Pipeline(
+    steps=[
+        ("transform", ct),
+        ("estimator", RandomForestClassifier()),
+    ]
+)
 
 # Let's change some hyperparameters. Of course, in any good application we
 # would tune them using, e.g., Random Search or Bayesian Optimization, but for

diff --git a/main/_downloads/763456abfe22344e6b6aed250579eddb/create_upload_tutorial.ipynb b/main/_downloads/763456abfe22344e6b6aed250579eddb/create_upload_tutorial.ipynb
@@ -281,7 +281,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.8.10"
+      "version": "3.8.16"
     }
   },
   "nbformat": 4,