Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate HTML output and send it to Slack, make output files downloadable in the web UI #3

Merged
merged 62 commits into from
Jan 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
7419b2d
Generate HTML output and send it to Slack
nkaretnikov Nov 12, 2023
5bbf836
Remove redundant format string
nkaretnikov Dec 5, 2023
5692f51
Use `parameters` instead of iterating over `envs`
nkaretnikov Dec 5, 2023
a1c8c59
Use `requests` in `send_to_slack`
nkaretnikov Dec 5, 2023
7a4e650
Use variables in the `papermill` command
nkaretnikov Dec 5, 2023
243c7ad
Fix linting issues
nkaretnikov Dec 5, 2023
d212f09
Use `os.path.basename` since `script` args are serialized
nkaretnikov Dec 6, 2023
eb57d45
Account for `parameters` being `None`
nkaretnikov Dec 6, 2023
3ff0866
Add docs on how to send to Slack
nkaretnikov Dec 6, 2023
27eadc9
Log Slack script output to a file
nkaretnikov Dec 6, 2023
d8a4624
Add missing import
nkaretnikov Dec 6, 2023
ebdff98
Try passing `logger` to the Slack script
nkaretnikov Dec 6, 2023
be83ac4
Try using a local logger since it cannot be an arg
nkaretnikov Dec 6, 2023
b5ac5f9
Fix the import
nkaretnikov Dec 6, 2023
060d837
Log `staging_paths`
nkaretnikov Dec 7, 2023
2c5dc9e
Try using filenames from `staging_paths`
nkaretnikov Dec 7, 2023
7a5d3d4
Log `job.create_time`
nkaretnikov Dec 7, 2023
f7c391e
Log to file in workflows
nkaretnikov Dec 7, 2023
ec75989
Append to log file since logger uses it before that
nkaretnikov Dec 7, 2023
d6b91fb
Use filenames with correct start time for cron jobs
nkaretnikov Dec 7, 2023
6325d4a
Move Container out of Steps
nkaretnikov Dec 7, 2023
322b264
Try moving main into a script
nkaretnikov Dec 7, 2023
fb51c6e
Add missing imports
nkaretnikov Dec 7, 2023
b95bc36
Revert "Add missing imports"
nkaretnikov Dec 7, 2023
2129c76
Revert "Try moving main into a script"
nkaretnikov Dec 7, 2023
f2827eb
Try moving main out of Steps
nkaretnikov Dec 7, 2023
3dc76d8
Revert "Try moving main out of Steps"
nkaretnikov Dec 7, 2023
bf98603
Try moving main out of Steps and use Parameters
nkaretnikov Dec 7, 2023
5c6aa59
Try calling the container directly
nkaretnikov Dec 7, 2023
3824941
Add missing inputs
nkaretnikov Dec 7, 2023
6dcbc71
Try generating start_time within a step
nkaretnikov Dec 7, 2023
0d31ba2
Try passing placeholder value as string
nkaretnikov Dec 7, 2023
adf5231
Try reading `start_time` from the DB
nkaretnikov Dec 7, 2023
673b8bb
Rename variable that clashes with another one
nkaretnikov Dec 7, 2023
52f160a
Order by start_time
nkaretnikov Dec 7, 2023
792b335
Add a rename step
nkaretnikov Dec 7, 2023
1f7a136
Add missing imports
nkaretnikov Dec 8, 2023
9f23570
Add missing imports
nkaretnikov Dec 8, 2023
5b6fce8
Add debug prints
nkaretnikov Dec 8, 2023
6110682
Create a symlink to make files downloadable
nkaretnikov Dec 8, 2023
2f88046
Handle symlinks when deleting scheduled jobs
nkaretnikov Dec 8, 2023
c02cc01
Remove debug prints
nkaretnikov Dec 8, 2023
5c96c3b
Read `start_time` from the DB in `send_to_slack`
nkaretnikov Dec 8, 2023
36cec7a
Always try to run the rename-files step
nkaretnikov Dec 8, 2023
ce33d45
Remove all symlinks related to the same job
nkaretnikov Dec 8, 2023
4ca14e6
Move common code outside of classes
nkaretnikov Dec 8, 2023
c24af7b
Add helpers for default filenames
nkaretnikov Dec 8, 2023
90060e2
Fix typo in function name
nkaretnikov Dec 8, 2023
8cfda20
Clean up `send_to_slack`
nkaretnikov Dec 8, 2023
d596dd5
Clean up `rename_files`
nkaretnikov Dec 8, 2023
93f9c83
Do not create intermediate variable
nkaretnikov Dec 8, 2023
4320226
Call `main` the way it used to be done before
nkaretnikov Dec 8, 2023
f71a435
Use full name when walking files
nkaretnikov Dec 8, 2023
7ec67e3
Silence the linter warning
nkaretnikov Dec 8, 2023
28d4792
Add new doc sections to ToC
nkaretnikov Dec 8, 2023
06f93c0
Add Slack screenshots to the doc
nkaretnikov Dec 8, 2023
82683a3
Resize images in the doc
nkaretnikov Dec 8, 2023
3305f76
Use `create_output_filename` from jupyter-scheduler
nkaretnikov Dec 8, 2023
3a42dd0
Update the comment
nkaretnikov Dec 9, 2023
a303e83
Move logging setup code out of try block
nkaretnikov Dec 17, 2023
503c7cf
Log exception trace
nkaretnikov Dec 17, 2023
195eb5c
Update README
nkaretnikov Dec 18, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 97 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,21 @@
- [argo-jupyter-scheduler](#argo-jupyter-scheduler)
- [Installation](#installation)
- [What is it?](#what-is-it)
- [Optional features](#optional-features)
- [Sending to Slack](#sending-to-slack)
- [A deeper dive](#a-deeper-dive)
- [`Job`](#job)
- [`Job Definition`](#job-definition)
- [Internals](#internals)
- [Output Files](#output-files)
- [Workflow Steps](#workflow-steps)
- [Additional thoughts](#additional-thoughts)
- [Known issues](#known-issues)
- [License](#license)

**Argo-Jupyter-Scheduler**

Submit longing running notebooks to run without the need to keep your JupyterLab server running. And submit a notebook to run on a specified schedule.
Submit long-running notebooks to run without the need to keep your JupyterLab server running. And submit a notebook to run on a specified schedule.

## Installation

Expand All @@ -30,28 +34,51 @@ pip install argo-jupyter-scheduler

## What is it?

Argo-Jupyter-Scheduler is a plugin to the [Jupyter-Scheduler](https://jupyter-scheduler.readthedocs.io/en/latest/index.html) JupyterLab extension.
Argo-Jupyter-Scheduler is a plugin to the [Jupyter-Scheduler](https://jupyter-scheduler.readthedocs.io/en/latest/index.html) JupyterLab extension.

What does that mean?

This means this is an application that gets installed in the JupyterLab base image and runs as an extension in JupyterLab. Specifically, you will see this icon at the bottom of the JupyterLab Launcher tab:
This means this is an application that gets installed in the JupyterLab base image and runs as an extension in JupyterLab. Specifically, you will see this icon at the bottom of the JupyterLab Launcher tab:

<img width="758" alt="Screenshot 2023-07-12 at 20 48 23" src="https://github.com/nebari-dev/argo-jupyter-scheduler/assets/42120229/a0a27a2e-1c75-404c-8fe6-2328cbb31cba">

And this icon on the toolbar of your Jupyter Notebook:

<img width="1227" alt="jupyter-scheduler-icon" src="https://github.com/nebari-dev/argo-jupyter-scheduler/assets/42120229/cae78aec-4d58-4d71-81cf-c73ed293bf64">

This also means, as a lab extension, this application is running within each user's separate JupyterLab server. The record of the notebooks you've submitted is specific to you and you only. There is no central Jupyter-Scheduler.
This also means, as a lab extension, this application is running within each user's separate JupyterLab server. The record of the notebooks you've submitted is specific to you and you only. There is no central Jupyter-Scheduler.

However, instead of using the base Jupyter-Scheduler, we are using **Argo-Jupyter-Scheduler**.
However, instead of using the base Jupyter-Scheduler, we are using **Argo-Jupyter-Scheduler**.

Why?

If you want to run your Jupyter Notebook on a schedule, you need to be assured that the notebook will be executed at the times you specified. The fundamental limitation with Jupyter-Scheduler is that when your JupyterLab server is not running, Jupyter-Scheduler is not running. Then the notebooks you had scheduled won't run. What about notebooks that you want to run right now? If the JupyterLab server is down, then how will the status of the notebook run be recorded?

The solution is Argo-Jupyter-Scheduler: Jupyter-Scheduler front-end with an Argo-Workflows back-end.

## Optional features

### Sending to Slack

Argo-Jupyter-Scheduler allows sending HTML output of an executed notebook to a
Slack channel:

- See the Slack API docs on how to create a bot token (starts with `xoxb`)
- Invite your bot to a Slack channel which will be used for sending output
- When scheduling a notebook (as described above):
- Select a conda environment that has `papermill` installed
- Add the following `Parameters`:
- name: `SLACK_TOKEN`, value: `xoxb-<Slack bot token>`
- name: `SLACK_CHANNEL`, value: `<Slack channel name>` (without leading `#`, like `scheduled-jobs`).

Create job:

<img src="./assets/create-job-slack.png" alt="Create job Slack" width="400"/>

Slack output:

<img src="./assets/slack-output.png" alt="Slack output" width="800"/>
nkaretnikov marked this conversation as resolved.
Show resolved Hide resolved

## A deeper dive

In the Jupyter-Scheduler lab extension, you can create two things, a `Job` and a `Job Definition`.
Expand All @@ -75,7 +102,7 @@ We are also relying on the [Nebari Workflow Controller](https://github.com/nebar

A `Job-Definition` is simply a way to create to Jobs that run on a specified schedule.

In Argo-Jupyter-Scheduler, `Job Definition` translate into a `Cron-Workflow` in Argo-Worflows. So when you create a `Job Definition`, you create a `Cron-Workflow` which in turn creates a `Workflow` to run when scheduled.
In Argo-Jupyter-Scheduler, `Job Definition` translate into a `Cron-Workflow` in Argo-Workflows. So when you create a `Job Definition`, you create a `Cron-Workflow` which in turn creates a `Workflow` to run when scheduled.

A `Job` is to `Workflow` as `Job Definition` is to `Cron-Workflow`.

Expand All @@ -84,12 +111,70 @@ A `Job` is to `Workflow` as `Job Definition` is to `Cron-Workflow`.

Jupyter-Scheduler creates and uses a `scheduler.sqlite` database to manage and keep track of the Jobs and Job Definitions. If you can ensure this database is accessible and can be updated when the status of a job or a job definition change, then you can ensure the view the user sees from JupyterLab match is accurate.

> By default this database is located at `~/.local/share/jupyter/scheduler.sqlite` but this is a trailet that can be modified. And since we have access to this database, we can update the database directly from the workflow itself.

To acommplish this, the workflow runs in two steps. First the workflow runs the notebook, using `papermill` and the conda environment specified. And second, depending on the success of this notebook run, updates the database with this status.

And when a job definition is created, a corresponding cron-workflow is created. To ensure the database is properly updated, the workflow that the cron-workflow creates has three steps. First, create a job record in the database with a status of `IN PROGRESS`. Second, run the notebook, again using `papermill` and the conda environment specified. And third, update the newly created job record with the status of the notebook run.

> By default this database is located at `~/.local/share/jupyter/scheduler.sqlite` but this is a traitlet that can be modified. And since we have access to this database, we can update the database directly from the workflow itself.

To accomplish this, the workflow runs in two steps. First the workflow runs the notebook, using `papermill` and the conda environment specified. And second, depending on the success of this notebook run, updates the database with this status.

And when a job definition is created, a corresponding cron-workflow is created. To ensure the database is properly updated, the workflow that the cron-workflow creates has these three steps. First, create a job record in the database with a status of `IN PROGRESS`. Second, run the notebook, again using `papermill` and the conda environment specified. And third, update the newly created job record with the status of the notebook run.

### Output Files

In addition to `papermill`, which creates the output notebook, `jupyter
nbconvert` is used to produce HTML output. To make these output files
downloadable via the web UI, it's important they match the format that
Jupyter-Scheduler expects, which is achieved by reusing `create_output_filename`
within Argo-Jupyter-Scheduler when creating output files.

The expected output filenames include timestamps that must match the start time
of a job. For cron jobs this is tricky because the start time is set whenever
`create-job-id` is run. All workflow steps are run in separate containers and
the `create-job-id` container is run after the `papermill` step, which creates
the output files.

Also, the `papermill` container is defined differently because it needs to have
access to filesystem mount points where the `papermill` and `jupyter` commands
are located as well as access the environment variables. Due to this, the
commands executed within the `papermill` container cannot be changed once it's
been defined.

This also means that the `papermill` container cannot have access to the job
start time and hence cannot create filenames with the expected timestamps. To
solve this problem, the `papermill` step always creates output files with the
same default filenames and there is an additional `rename-files` step that runs
after `create-job-id`, which makes sure the timestamps match the job start time.
To pass the start time value between containers, the SQLite database is used.

Finally, because `create-job-id` creates a new job every time it runs, this job
will also have a new id. The job id is important since it's the same as the name
of the staging directory where job output files are expected to be found by
Jupyter-Scheduler. But the output files are created in the `papermill` step,
which has the id of a job that defined the workflow originally when it was
scheduled, not the current one created in `create-job-id`. To point to the
proper location on disk, a symlink is created connecting staging job
directories. This is also done in the `rename-files` step by looking up the job
ids in the SQLite database.

For non-cron jobs, there is no `create-job-id` step. The rest of the workflow
steps are the same, but no database lookups are performed and no symlinks are
created. This is not necessary because the start time is immediately available
and the job id matches the job staging area.

### Workflow Steps

Here's the overview of the workflow steps:

- `main` runs `papermill` and `jupyter nbconvert` to create output files
- `create-job-id` creates a new job that can run without JupyterLab (only
for cron jobs)
- `rename-files` updates timestamps of output files and adds symlinks between
job staging directories
- `send-to-slack` sends HTML output to a Slack channel (only if `SLACK_TOKEN`
and `SLACK_CHANNEL` are provided via `Parameters` when scheduling a job)
- `failure` or `success` sets status as "Failed" or "Completed" in the web UI.

These steps are executed sequentially in separate containers. If a step fails,
the `failure` step is called in the end. Otherwise, the `success` step is
called.

## Additional Thoughts

Expand Down
Loading