Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mention if papermill is not installed in failed status #10

Merged
merged 6 commits into from
May 29, 2024

Conversation

nkaretnikov
Copy link
Collaborator

@nkaretnikov nkaretnikov commented May 18, 2024

Reference Issues or PRs

Fixes #8.

This PR adds additional info to the failed status message if papermill is not installed. This is implemented by capturing the status code of the papermill command and writing it to a file that's accessible by all argo steps of this job, via a shared filesystem in the staging area.

Screenshot 2024-05-18 at 23 32 42

What does this implement/fix?

Put a x in the boxes that apply

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds a feature)
  • Breaking change (fix or feature that would cause existing features not to work as expected)
  • Documentation Update
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no API changes)
  • Build related changes
  • Other (please describe):

Testing

  • Did you test the pull request locally?
  • Did you add new tests?

Testing checklist:

  • "Run" works, including Send to Slack, papermill_status.txt = 0
  • "Run" with an exception in the notebook is reported as "Workflow failed.", papermill_status.txt = 1
  • "Run" without papermill installed is reported as "Workflow failed (papermill not found).", papermill_status.txt = 127
  • same as 1, but with "Run on schedule"
  • same as 2, but with "Run on schedule"
  • same as 3, but with "Run on schedule".

Tested on Azure, Nebari 2024.5.1.

Documentation

Access-centered content checklist

Text styling

  • The content is written with plain language (where relevant).
  • If there are headers, they use the proper header tags (with only one level-one header: H1 or # in markdown).
  • All links describe where they link to (for example, check the Nebari website).
  • This content adheres to the Nebari style guides.

Non-text content

  • All content is represented as text (for example, images need alt text, and videos need captions or descriptive transcripts).
  • If there are emojis, there are not more than three in a row.
  • Don't use flashing GIFs or videos.
  • If the content were to be read as plain text, it still makes sense, and no information is missing.

Any other comments?

@nkaretnikov
Copy link
Collaborator Author

This is tested and ready for review.

@nkaretnikov nkaretnikov marked this pull request as ready for review May 19, 2024 00:12
@nkaretnikov nkaretnikov added area: user experience 👩🏻‍💻 needs: review 👀 This PR is complete and ready for reviewing status: in review 👀 This PR is currently being reviewed by the team type: enhancement 💅🏼 New feature or request labels May 19, 2024
@krassowski krassowski self-requested a review May 20, 2024 07:54
@krassowski
Copy link
Member

I'm trying to test this but run into jupyter-server/jupyter-scheduler#519 - I see this issue both locally and on active deployments.

I think we may want to pin jupyter-scheduler in the docker images in the meantime to avoid this issue on deployments.

@viniciusdc
Copy link

Thanks @nkaretnikov for having a look at this!! and thanks @krassowski for testing it out as well 🚀

I think we may want to pin jupyter-scheduler in the docker images in the meantime to avoid this issue on deployments.
I wold suggest we go with this option, until this is addressed upstream

@krassowski
Copy link
Member

Ok, pinning jupyter-scheduler helped, I run into another issue when using self-signed certificate:

image

I am retrying with Let's encrypt certificate now.

@nkaretnikov
Copy link
Collaborator Author

@krassowski Hey Mike, any update here? Let me know if you need help testing. And yes, I did use this in my config, so didn't run into any issues with certs:

certificate:
  type: lets-encrypt
  acme_email: <email>
  acme_server: https://acme-v02.api.letsencrypt.org/directory

@krassowski
Copy link
Member

I did try with lets-encrypt and I am still getting the SSL error. The full traceback extracted from the pod is:

│ argo_jupyter_scheduler.utils:INFO: conda_env_path: /home/conda/global/envs/global-test                                                                                                                                                                                                                                                                                                                                                 │
│ argo_jupyter_scheduler.utils:INFO: output_path: /home/mike/.local/share/jupyter/scheduler_staging_area/81c5b948-4b96-4f35-84ed-39c835723e8a/Untitled4-1969-12-31-06-00-00-PM.ipynb                                                                                                                                                                                                                                                     │
│ argo_jupyter_scheduler.utils:INFO: log_path: /home/mike/.local/share/jupyter/scheduler_staging_area/81c5b948-4b96-4f35-84ed-39c835723e8a/logs.txt                                                                                                                                                                                                                                                                                      │
│ argo_jupyter_scheduler.utils:INFO: html_path: /home/mike/.local/share/jupyter/scheduler_staging_area/81c5b948-4b96-4f35-84ed-39c835723e8a/Untitled4-1969-12-31-06-00-00-PM.html                                                                                                                                                                                                                                                        │
│ argo_jupyter_scheduler.utils:INFO: papermill_status_path: /home/mike/.local/share/jupyter/scheduler_staging_area/81c5b948-4b96-4f35-84ed-39c835723e8a/papermill_status.txt                                                                                                                                                                                                                                                             │
│ Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                                                                                     │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request                                                                                                                                                                                                                                                                                                                    │
│     self._validate_conn(conn)                                                                                                                                                                                                                                                                                                                                                                                                          │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn                                                                                                                                                                                                                                                                                                                  │
│     conn.connect()                                                                                                                                                                                                                                                                                                                                                                                                                     │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/urllib3/connection.py", line 653, in connect                                                                                                                                                                                                                                                                                                                              │
│     sock_and_verified = _ssl_wrap_socket_and_match_hostname(                                                                                                                                                                                                                                                                                                                                                                           │
│                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                                                                                                                                           │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/urllib3/connection.py", line 806, in _ssl_wrap_socket_and_match_hostname                                                                                                                                                                                                                                                                                                  │
│     ssl_sock = ssl_wrap_socket(                                                                                                                                                                                                                                                                                                                                                                                                        │
│                ^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                                                                                                                                                                        │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/urllib3/util/ssl_.py", line 465, in ssl_wrap_socket                                                                                                                                                                                                                                                                                                                       │
│     ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)                                                                                                                                                                                                                                                                                                                                                       │
│                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                                                                                                                       │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/urllib3/util/ssl_.py", line 509, in _ssl_wrap_socket_impl                                                                                                                                                                                                                                                                                                                 │
│     return ssl_context.wrap_socket(sock, server_hostname=server_hostname)                                                                                                                                                                                                                                                                                                                                                              │
│            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                                                                                                                              │
│   File "/opt/conda/envs/default/lib/python3.11/ssl.py", line 517, in wrap_socket                                                                                                                                                                                                                                                                                                                                                       │
│     return self.sslsocket_class._create(                                                                                                                                                                                                                                                                                                                                                                                               │
│            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                                                                                                                                                               │
│   File "/opt/conda/envs/default/lib/python3.11/ssl.py", line 1104, in _create                                                                                                                                                                                                                                                                                                                                                          │
│     self.do_handshake()                                                                                                                                                                                                                                                                                                                                                                                                                │
│   File "/opt/conda/envs/default/lib/python3.11/ssl.py", line 1382, in do_handshake                                                                                                                                                                                                                                                                                                                                                     │
│     self._sslobj.do_handshake()                                                                                                                                                                                                                                                                                                                                                                                                        │
│ ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1006)                                                                                                                                                                                                                                                                                                        │
│                                                                                                                                                                                                                                                                                                                                                                                                                                        │
│ During handling of the above exception, another exception occurred:                                                                                                                                                                                                                                                                                                                                                                    │
│                                                                                                                                                                                                                                                                                                                                                                                                                                        │
│ Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                                                                                     │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/urllib3/connectionpool.py", line 793, in urlopen                                                                                                                                                                                                                                                                                                                          │
│     response = self._make_request(                                                                                                                                                                                                                                                                                                                                                                                                     │
│                ^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                                                                                                                                                                     │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/urllib3/connectionpool.py", line 491, in _make_request                                                                                                                                                                                                                                                                                                                    │
│     raise new_e                                                                                                                                                                                                                                                                                                                                                                                                                        │
│ urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1006)                                                                                                                                                                                                                                                                                                         │
│                                                                                                                                                                                                                                                                                                                                                                                                                                        │
│ The above exception was the direct cause of the following exception:                                                                                                                                                                                                                                                                                                                                                                   │
│                                                                                                                                                                                                                                                                                                                                                                                                                                        │
│ Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                                                                                     │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/requests/adapters.py", line 564, in send                                                                                                                                                                                                                                                                                                                                  │
│     resp = conn.urlopen(                                                                                                                                                                                                                                                                                                                                                                                                               │
│            ^^^^^^^^^^^^^                                                                                                                                                                                                                                                                                                                                                                                                               │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/urllib3/connectionpool.py", line 847, in urlopen                                                                                                                                                                                                                                                                                                                          │
│     retries = retries.increment(                                                                                                                                                                                                                                                                                                                                                                                                       │
│               ^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                                                                                                                                                                       │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/urllib3/util/retry.py", line 515, in increment                                                                                                                                                                                                                                                                                                                            │
│     raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]                                                                                                                                                                                                                                                                                                                                                      │
│     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                                                                                                                                                │
│ urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='nebari', port=443): Max retries exceeded with url: /argo/api/v1/workflows/dev (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1006)')))                                                                                                                                       │

@krassowski
Copy link
Member

The date looks odd: 1969-12-31-06-00-00-PM. Maybe that's why the certificate validation is failing?

@krassowski
Copy link
Member

krassowski commented May 26, 2024

It seems it would be coming from here:

http = urllib3.PoolManager()
response = http.request("GET", url, headers={"Authorization": f"Bearer {token}"})

Edit: passing cert_reqs='CERT_NONE' to urllib3.PoolManager should resolve the issue. I will try this from a branch.
Edit 2: to clarify we should only add it here conditionally. Maybe nebari could pass information via an environment variable if this is a local deployment in which case we could disable verification.

@nkaretnikov
Copy link
Collaborator Author

@krassowski

The date looks odd: 1969-12-31-06-00-00-PM. Maybe that's why the certificate validation is failing?

This is expected, it a placeholder filename (set to the Unix epoch), so we could find it. It's renamed later to the proper date by the rename-files task. Has nothing to do with the active date on the system.


I think the issue might be due to dependencies changing in your deployment since you had to pin the package.

Here's how I personally test this. I push the current jupyterlab upstream image to my docker hub (not part of this script) and use it as a base image (BASE_TAG). Then I pull it, install argo-jupyter-scheduler from my feature branch into it and push it again to dockerhub (NEW_TAG). Then I use that in the Nebari deploy config, by specifying a custom jlab image:

#!/usr/bin/env bash

set -euxo pipefail

BASE_TAG=$1
NEW_TAG=$2

ID=$(docker run -tid  nkaretnikov/nebari-jupyterlab-papermill-error-8:${BASE_TAG})

docker exec $ID /bin/sh -c 'conda run  --no-capture-output -n default /bin/sh -c "pip install git+https://github.com/nkaretnikov/argo-jupyter-scheduler@papermill-error-8"'
docker exec $ID /bin/sh -c 'conda run  --no-capture-output -n default python -c "from argo_jupyter_scheduler import executor"'

docker commit $ID nkaretnikov/nebari-jupyterlab-papermill-error-8:${NEW_TAG}
docker push nkaretnikov/nebari-jupyterlab-papermill-error-8:${NEW_TAG}

docker stop $ID

echo "UPDATE TAG IN NEBARI CONFIGS!"
echo "jupyterlab: docker.io/nkaretnikov/nebari-jupyterlab-papermill-error-8:${NEW_TAG}"

On my dockerhub, the upstream base tag is x86_0 and the one that was tested is x86_4:

https://hub.docker.com/repository/docker/nkaretnikov/nebari-jupyterlab-papermill-error-8/tags

You could try pulling these images and checking whether the package versions are different from yours.

If you're going to use this script yourself, you might need to update the GitHub and DockerHub repos.

@krassowski
Copy link
Member

I think the issue might be due to dependencies changing in your deployment since you had to pin the package.

Yes, I think this is it. But specifically I think this is related to requests upgrade similarly to nebari-dev/nebari#2477, see explanation in nebari-dev/nebari#2481 (comment). I just could not find a connection to requests package, but now I see it further down in a part of the traceback that I initially disregarded:

│ During handling of the above exception, another exception occurred:                                                                                                                                                                                                                                                                                                                                                                    │
│                                                                                                                                                                                                                                                                                                                                                                                                                                        │
│ Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                                                                                     │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/jupyter_scheduler/executors.py", line 60, in process                                                                                                                                                                                                                                                                                                                      │
│     self.execute()                                                                                                                                                                                                                                                                                                                                                                                                                     │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/argo_jupyter_scheduler/executor.py", line 98, in execute                                                                                                                                                                                                                                                                                                                  │
│     self.create_workflow(                                                                                                                                                                                                                                                                                                                                                                                                              │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/argo_jupyter_scheduler/executor.py", line 287, in create_workflow                                                                                                                                                                                                                                                                                                         │
│     w.create()                                                                                                                                                                                                                                                                                                                                                                                                                         │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/hera/workflows/workflow.py", line 363, in create                                                                                                                                                                                                                                                                                                                          │
│     wf = self.workflows_service.create_workflow(                                                                                                                                                                                                                                                                                                                                                                                       │
│          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                                                                                                                                                       │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/hera/workflows/service.py", line 843, in create_workflow                                                                                                                                                                                                                                                                                                                  │
│     resp = requests.post(                                                                                                                                                                                                                                                                                                                                                                                                              │
│            ^^^^^^^^^^^^^^                                                                                                                                                                                                                                                                                                                                                                                                              │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/requests/api.py", line 115, in post                                                                                                                                                                                                                                                                                                                                       │
│     return request("post", url, data=data, json=json, **kwargs)                                                                                                                                                                                                                                                                                                                                                                        │
│            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                                                                                                                                        │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/requests/api.py", line 59, in request                                                                                                                                                                                                                                                                                                                                     │
│     return session.request(method=method, url=url, **kwargs)                                                                                                                                                                                                                                                                                                                                                                           │
│            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                                                                                                                                           │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/requests/sessions.py", line 589, in request                                                                                                                                                                                                                                                                                                                               │
│     resp = self.send(prep, **send_kwargs)                                                                                                                                                                                                                                                                                                                                                                                              │
│            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                                                                                                                                                              │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/requests/sessions.py", line 703, in send                                                                                                                                                                                                                                                                                                                                  │
│     r = adapter.send(request, **kwargs)                                                                                                                                                                                                                                                                                                                                                                                                │
│         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                                                                                                                                                                │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/requests/adapters.py", line 620, in send                                                                                                                                                                                                                                                                                                                                  │
│     raise SSLError(e, request=request)                                                                                                                                                                                                                                                                                                                                                                                                 │
│ requests.exceptions.SSLError: HTTPSConnectionPool(host='nebari', port=443): Max retries exceeded with url: /argo/api/v1/workflows/dev (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1006)')))                                                                                                                                           │

It would be hera which depends on requests.

@krassowski
Copy link
Member

Small progress here - after adding from hera.shared import GlobalConfig; GlobalConfig.verify_ssl = False now I am getting a different error:

│ /opt/conda/envs/default/lib/python3.11/site-packages/urllib3/connectionpool.py:1103: InsecureRequestWarning: Unverified HTTPS request is being made to host 'nebari'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings                                                                                                                              │
│   warnings.warn(                                                                                                                                                                                                                                                                                                                                                                                                                       │
│ Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                                                                                     │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/jupyter_scheduler/executors.py", line 60, in process                                                                                                                                                                                                                                                                                                                      │
│     self.execute()                                                                                                                                                                                                                                                                                                                                                                                                                     │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/argo_jupyter_scheduler/executor.py", line 101, in execute                                                                                                                                                                                                                                                                                                                 │
│     self.create_workflow(                                                                                                                                                                                                                                                                                                                                                                                                              │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/argo_jupyter_scheduler/executor.py", line 290, in create_workflow                                                                                                                                                                                                                                                                                                         │
│     w.create()                                                                                                                                                                                                                                                                                                                                                                                                                         │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/hera/workflows/workflow.py", line 363, in create                                                                                                                                                                                                                                                                                                                          │
│     wf = self.workflows_service.create_workflow(                                                                                                                                                                                                                                                                                                                                                                                       │
│          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                                                                                                                                                       │
│   File "/opt/conda/envs/default/lib/python3.11/site-packages/hera/workflows/service.py", line 858, in create_workflow                                                                                                                                                                                                                                                                                                                  │
│     raise exception_from_server_response(resp)                                                                                                                                                                                                                                                                                                                                                                                         │
│ hera.exceptions.InternalServerError: Server returned status code 500 with message: `Internal error occurred: failed calling webhook "wf-mutating-admission-controller.dev.svc": failed to call webhook: Post "https://nebari/argo/mutate?timeout=10s": tls: failed to verify certificate: x509: certificate is valid for d92662fb799af1756d31ba6cc97fcb3a.82d4aee8cf4f91ed800d55bc54359963.traefik.default, not nebari`                │

@nkaretnikov
Copy link
Collaborator Author

@krassowski I've updated and re-tested the code with a notebook containing parens and spaces. PTAL.

@nkaretnikov nkaretnikov requested a review from krassowski May 28, 2024 15:52
Copy link
Member

@krassowski krassowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @nkaretnikov!

@nkaretnikov nkaretnikov merged commit 5dea3f7 into nebari-dev:main May 29, 2024
5 checks passed
@nkaretnikov nkaretnikov removed needs: review 👀 This PR is complete and ready for reviewing status: in review 👀 This PR is currently being reviewed by the team labels May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can we raise a more informative error when papermill is not installed?
3 participants