Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crons: Migrate to Cronsim library over croniter #80208

Closed
15 tasks done
evanpurkhiser opened this issue Nov 4, 2024 · 6 comments
Closed
15 tasks done

crons: Migrate to Cronsim library over croniter #80208

evanpurkhiser opened this issue Nov 4, 2024 · 6 comments

Comments

@evanpurkhiser
Copy link
Member

evanpurkhiser commented Nov 4, 2024

Given problems like #66868 and the difficulty of working in the croniter codebase, we should move to cronsim.

Fixes #66868

Here's what we'll need to do to be able to move to this library

Tasks

Preview Give feedback
  1. Scope: Backend
  2. Scope: Backend
  3. Scope: Backend
  4. Scope: Backend
  5. Scope: Backend
  6. Scope: Backend
  7. Scope: Backend
@evanpurkhiser
Copy link
Member Author

Validating DST

Just looking at the cronsim tests for DST seem very reasonable that it works correctly https://github.com/cuu508/cronsim/blob/e8bc4be781f00f0ad6d11373a11be055d7a3b4da/tests/test_cronsim.py#L293

@evanpurkhiser
Copy link
Member Author

evanpurkhiser commented Nov 4, 2024

Validate that the library can parse all our schedules

Using this code

from datetime import datetime, UTC
from cronsim import CronSim

valid_count = 0
invalid_expressions = []

dt = datetime(2024, 1, 1, 0, 0, tzinfo=UTC)


with open('crons.txt', "r") as file:
    valid_count = 0
    for line_num, expression in enumerate(file, start=1):
        cron_expr = expression.strip()
        try:
            # Attempt to parse and simulate the cron expression
            CronSim(cron_expr, dt)
            valid_count += 1

        except Exception as e:
            # Store invalid expressions with line number and error
            invalid_expressions.append((line_num, cron_expr, str(e)))

# Print results
if invalid_expressions:
    for line_num, cron_expr, error in invalid_expressions:
        print(error)
else:
    print("No invalid expressions found.")

on all existing crontab expressions in our database we get two results

$ python test.py | sort | uniq
Bad day-of-week
Wrong number of fields
  • It looks like we have a number of expressions that still include the seconds in their crontab
  • Cronsim does not yet support wrap-around style TUE-SUN type fields. This will be fixed by Support "wrap around" DOW cuu508/cronsim#3
    • After some discussion with @cuu508 they've pointed out this is a non-standard expression. I'll go ahead and correct these for our customers.

The migration for the incorrect number of expressions is here

I've also fixed up the whitespace while in here with

This is complete and was run in US and DE

I've confirmed locally after cleaning up the schedules and fixing the warp around that the output is now

$ python test.py | sort | uniq
No invalid expressions found.

@JoshFerge
Copy link
Member

do we need to ensure that in addition to the library being able to parse our current crons, that the output is the same between them?

@evanpurkhiser
Copy link
Member Author

evanpurkhiser commented Nov 4, 2024

do we need to ensure that in addition to the library being able to parse our current crons, that the output is the same between them?

Yep, that's why we have these two steps

Validate that the library can parse all our schedules
Validate that all schedules are computed the same between croniter and cronsim

@evanpurkhiser
Copy link
Member Author

evanpurkhiser commented Nov 5, 2024

Validate that all schedules are computed the same between croniter and cronsim

To do this I've written a script that runs against all of the cron expressions we currently have in our database. For each expression we compute the next schedule from every hour of every day in the year.

import sys
from datetime import UTC, datetime, timedelta

from croniter import croniter
from cronsim import CronSim

valid_count = 0
invalid_expressions = []

start = datetime(2024, 1, 1, 0, 0, tzinfo=UTC)

bad_expressions = set()


with open("crons.txt", "r") as file:
    lines = list(l for l in file)
    total = len(lines)
    curr = 0

    for line_num, expression in enumerate(lines):
        curr = curr + 1
        cron_expr = expression.strip()

        print(f"{curr}/{total}", end="\r")

        now = start
        next_year = start + timedelta(days=365)

        while now < next_year:
            if cron_expr in bad_expressions:
                break

            cronsim_next = next(CronSim(cron_expr, now))
            croniter_next = croniter(cron_expr, now, ret_type=datetime).get_next()

            if cronsim_next != croniter_next:
                bad_expressions.add(cron_expr)
                sys.stdout.write("\033[K")
                print(f"bad for {cron_expr}")
                print(f" from     -> {now}")
                print(f" croniter -> {croniter_next}")
                print(f" cronsim  -> {cronsim_next}")
                print()

            now = now + timedelta(hours=1)

Here's the output. All differences as far as I can tell are due to how croniter incorrectly computes the step.

$ python test.py

bad for 00 06 */10 * *
 from     -> 2024-02-21 06:00:00+00:00
 croniter -> 2024-03-11 06:00:00+00:00
 cronsim  -> 2024-03-01 06:00:00+00:00

bad for 0 0 */15 * *
 from     -> 2024-02-16 00:00:00+00:00
 croniter -> 2024-03-16 00:00:00+00:00
 cronsim  -> 2024-03-01 00:00:00+00:00

bad for 0 0 1/6 * *
 from     -> 2024-02-25 00:00:00+00:00
 croniter -> 2024-03-07 00:00:00+00:00
 cronsim  -> 2024-03-01 00:00:00+00:00

bad for 00 20 */3 * *
 from     -> 2024-02-28 20:00:00+00:00
 croniter -> 2024-03-04 20:00:00+00:00
 cronsim  -> 2024-03-01 20:00:00+00:00

bad for 0 0 */3 * *
 from     -> 2024-02-28 00:00:00+00:00
 croniter -> 2024-03-04 00:00:00+00:00
 cronsim  -> 2024-03-01 00:00:00+00:00

bad for 0 0 */6 * *
 from     -> 2024-02-25 00:00:00+00:00
 croniter -> 2024-03-07 00:00:00+00:00
 cronsim  -> 2024-03-01 00:00:00+00:00

bad for 0 10 */14 * WED
 from     -> 2024-01-01 00:00:00+00:00
 croniter -> 2024-01-01 10:00:00+00:00
 cronsim  -> 2024-05-01 10:00:00+00:00

bad for 0 1 */3 * *
 from     -> 2024-02-28 01:00:00+00:00
 croniter -> 2024-03-04 01:00:00+00:00
 cronsim  -> 2024-03-01 01:00:00+00:00

bad for 0 14 */5 * *
 from     -> 2024-02-26 14:00:00+00:00
 croniter -> 2024-03-06 14:00:00+00:00
 cronsim  -> 2024-03-01 14:00:00+00:00

bad for 0 23/2,1-10/2 * * *
 from     -> 2024-01-01 01:00:00+00:00
 croniter -> 2024-01-01 02:00:00+00:00
 cronsim  -> 2024-01-01 03:00:00+00:00

bad for 0 3 */3 * *
 from     -> 2024-02-28 03:00:00+00:00
 croniter -> 2024-03-04 03:00:00+00:00
 cronsim  -> 2024-03-01 03:00:00+00:00

bad for 0 4 1-7 * */7
 from     -> 2024-01-01 00:00:00+00:00
 croniter -> 2024-01-01 04:00:00+00:00
 cronsim  -> 2024-01-07 04:00:00+00:00

bad for 0 4 */5 * *
 from     -> 2024-02-26 04:00:00+00:00
 croniter -> 2024-03-06 04:00:00+00:00
 cronsim  -> 2024-03-01 04:00:00+00:00

bad for 0 9 */100,1-7 * 3
 from     -> 2024-01-01 00:00:00+00:00
 croniter -> 2024-01-01 09:00:00+00:00
 cronsim  -> 2024-01-03 09:00:00+00:00

bad for 10 20 */5 * *
 from     -> 2024-02-26 21:00:00+00:00
 croniter -> 2024-03-06 20:10:00+00:00
 cronsim  -> 2024-03-01 20:10:00+00:00

bad for 15 20 */5 * *
 from     -> 2024-02-26 21:00:00+00:00
 croniter -> 2024-03-06 20:15:00+00:00
 cronsim  -> 2024-03-01 20:15:00+00:00

bad for 15 6 */3 * *
 from     -> 2024-02-28 07:00:00+00:00
 croniter -> 2024-03-04 06:15:00+00:00
 cronsim  -> 2024-03-01 06:15:00+00:00

bad for 20-59/5 1-1 * * *
 from     -> 2024-01-01 00:00:00+00:00
 croniter -> 2024-01-01 00:20:00+00:00
 cronsim  -> 2024-01-01 01:20:00+00:00

bad for 30 1 */3 * *
 from     -> 2024-02-28 02:00:00+00:00
 croniter -> 2024-03-04 01:30:00+00:00
 cronsim  -> 2024-03-01 01:30:00+00:00

bad for 30 4 */3 * *
 from     -> 2024-02-28 05:00:00+00:00
 croniter -> 2024-03-04 04:30:00+00:00
 cronsim  -> 2024-03-01 04:30:00+00:00

bad for 40 02 */3 * *
 from     -> 2024-02-28 03:00:00+00:00
 croniter -> 2024-03-04 02:40:00+00:00
 cronsim  -> 2024-03-01 02:40:00+00:00

bad for 40 0 */3 * *
 from     -> 2024-02-28 01:00:00+00:00
 croniter -> 2024-03-04 00:40:00+00:00
 cronsim  -> 2024-03-01 00:40:00+00:00

bad for 40 1 */10 * *
 from     -> 2024-02-21 02:00:00+00:00
 croniter -> 2024-03-11 01:40:00+00:00
 cronsim  -> 2024-03-01 01:40:00+00:00

bad for 5 20 */5 * *
 from     -> 2024-02-26 21:00:00+00:00
 croniter -> 2024-03-06 20:05:00+00:00
 cronsim  -> 2024-03-01 20:05:00+00:00

evanpurkhiser added a commit that referenced this issue Nov 6, 2024
We're replacing croniter with cronsim as it's bit more of a modern
library. This replaces the single usage outsode of the cron monitors
feature of sentry.

See GH-80208
evanpurkhiser added a commit that referenced this issue Nov 6, 2024
We're replacing croniter with cronsim as it's bit more of a modern
library. This replaces the single usage outsode of the cron monitors
feature of sentry.

See GH-80208
evanpurkhiser added a commit that referenced this issue Nov 6, 2024
We're replacing croniter with cronsim as it's bit more of a modern
library. This replaces the single usage outsode of the cron monitors
feature of sentry.

See GH-80208
evanpurkhiser added a commit that referenced this issue Nov 6, 2024
We're replacing croniter with cronsim as it's bit more of a modern
library. This replaces the single usage outsode of the cron monitors
feature of sentry.

See GH-80208
@evanpurkhiser
Copy link
Member Author

This is complete.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants