Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P1 - Separate Cron Tasks From Impact-Graph #1656

Open
mhmdksh opened this issue Jun 26, 2024 · 11 comments
Open

P1 - Separate Cron Tasks From Impact-Graph #1656

mhmdksh opened this issue Jun 26, 2024 · 11 comments
Assignees
Labels
devops Issues related to Devops kubernetes

Comments

@mhmdksh
Copy link
Collaborator

mhmdksh commented Jun 26, 2024

In an attempt to make impact-graph replicable and running in a high availability environment. Making this a reality concerns running multiple impact-graph instances at the same time which are all connected to the same DB and doing the same operations depending on the load and the traffic.

One thing that is preventing that from happening is embedding a lot of cronjobs inside impact-graph, making it impossible to run in a replicable manner.

Something that can help achieve that is separating these crons from impact-graph and make them run as a separate service that is talking to and endpoint on impact-graph which is called from somewhere else.

This will make us one step closer to run impact-graph in high-availability infra (Like Kubernetes or others)

@jainkrati @aminlatifi @mohammadranjbarz @CarlosQ96 I would appreciate your opinion on this

@mhmdksh
Copy link
Collaborator Author

mhmdksh commented Jun 26, 2024

@mohammadranjbarz I'll need your help to group all the cronjobs that we are currently embedding in impact-graph that we can detach. My current list if the cronjobs is the below:

CHECK_PROJECT_VERIFICATION_STATUS_CRONJOB_EXPRESSION
CHECK_USERS_SUPER_TOKEN_BALANCES_CRONJOB_TIME
DONATION_SAVE_BACKUP_CRONJOB_EXPRESSION
FILL_BLOCK_NUMBERS_OF_SNAPSHOTS_CRONJOB_EXPRESSION
FILL_POWER_SNAPSHOT_BALANCE_CRONJOB_EXPRESSION
IMPORT_LOST_DONATIONS_CRONJOB_EXPRESSION
INSTANT_BOOSTING_UPDATE_CRONJOB_EXPRESSION
MAKE_UNREVIEWED_PROJECT_LISTED_CRONJOB_EXPRESSION
MATCH_DRAFT_DONATION_CRONJOB_EXPRESSION
REVIEW_OLD_GIV_PRICES_CRONJOB_EXPRESSION
SYNC_GIVING_BLOCKS_CRONJOB_EXPRESSION
SYNC_IDRISS_TWITTER_DONATIONS_CRONJOB_EXPRESSION
SYNC_POIGN_ART_CRONJOB_EXPRESSION
UPDATE_POWER_ROUND_CRONJOB_EXPRESSION
VERIFY_DONATION_CRONJOB_EXPRESSION
VERIFY_RECURRING_DONATION_CRONJOB_EXPRESSION

@mohammadranjbarz
Copy link
Collaborator

@mohammadranjbarz I'll need your help to group all the cronjobs that we are currently embedding in impact-graph that we can detach. My current list if the cronjobs is the below:

CHECK_PROJECT_VERIFICATION_STATUS_CRONJOB_EXPRESSION
CHECK_USERS_SUPER_TOKEN_BALANCES_CRONJOB_TIME
DONATION_SAVE_BACKUP_CRONJOB_EXPRESSION
FILL_BLOCK_NUMBERS_OF_SNAPSHOTS_CRONJOB_EXPRESSION
FILL_POWER_SNAPSHOT_BALANCE_CRONJOB_EXPRESSION
IMPORT_LOST_DONATIONS_CRONJOB_EXPRESSION
INSTANT_BOOSTING_UPDATE_CRONJOB_EXPRESSION
MAKE_UNREVIEWED_PROJECT_LISTED_CRONJOB_EXPRESSION
MATCH_DRAFT_DONATION_CRONJOB_EXPRESSION
REVIEW_OLD_GIV_PRICES_CRONJOB_EXPRESSION
SYNC_GIVING_BLOCKS_CRONJOB_EXPRESSION
SYNC_IDRISS_TWITTER_DONATIONS_CRONJOB_EXPRESSION
SYNC_POIGN_ART_CRONJOB_EXPRESSION
UPDATE_POWER_ROUND_CRONJOB_EXPRESSION
VERIFY_DONATION_CRONJOB_EXPRESSION
VERIFY_RECURRING_DONATION_CRONJOB_EXPRESSION

What do you mean by separating them?
Do you want to create another repo and move to there? if you want to do it it's very hard because they use lots of common functions and entities, My suggestion is to bring up multiple instance of impact-graph, in one of them the cronjobs are enabled in other instances the jobs are disabled, it's easier to manage them

@mhmdksh
Copy link
Collaborator Author

mhmdksh commented Jun 26, 2024

@mohammadranjbarz I'll need your help to group all the cronjobs that we are currently embedding in impact-graph that we can detach. My current list if the cronjobs is the below:

CHECK_PROJECT_VERIFICATION_STATUS_CRONJOB_EXPRESSION
CHECK_USERS_SUPER_TOKEN_BALANCES_CRONJOB_TIME
DONATION_SAVE_BACKUP_CRONJOB_EXPRESSION
FILL_BLOCK_NUMBERS_OF_SNAPSHOTS_CRONJOB_EXPRESSION
FILL_POWER_SNAPSHOT_BALANCE_CRONJOB_EXPRESSION
IMPORT_LOST_DONATIONS_CRONJOB_EXPRESSION
INSTANT_BOOSTING_UPDATE_CRONJOB_EXPRESSION
MAKE_UNREVIEWED_PROJECT_LISTED_CRONJOB_EXPRESSION
MATCH_DRAFT_DONATION_CRONJOB_EXPRESSION
REVIEW_OLD_GIV_PRICES_CRONJOB_EXPRESSION
SYNC_GIVING_BLOCKS_CRONJOB_EXPRESSION
SYNC_IDRISS_TWITTER_DONATIONS_CRONJOB_EXPRESSION
SYNC_POIGN_ART_CRONJOB_EXPRESSION
UPDATE_POWER_ROUND_CRONJOB_EXPRESSION
VERIFY_DONATION_CRONJOB_EXPRESSION
VERIFY_RECURRING_DONATION_CRONJOB_EXPRESSION

What do you mean by separating them? Do you want to create another repo and move to there? if you want to do it it's very hard because they use lots of common functions and entities, My suggestion is to bring up multiple instance of impact-graph, in one of them the cronjobs are enabled in other instances the jobs are disabled, it's easier to manage them

@mohammadranjbarz Thanks for the suggestion, but I would say that making one version of impact-graph different than the others makes this whole idea of replicability worthless.

If we want to worry which version on impact-graph is deciding what cronjobs are run, then it is an additional complication that is in the way of making it ready for scaling.

@geleeroyale WDYT?

@mhmdksh
Copy link
Collaborator Author

mhmdksh commented Jul 22, 2024

Mentioning @jainkrati @mohammadranjbarz @divine-comedian @aminlatifi @geleeroyale @Rolazo for more engagement.

@geleeroyale geleeroyale added the devops Issues related to Devops label Jul 22, 2024
@divine-comedian divine-comedian moved this from New Issues to Dev Research in All-Devs Jul 22, 2024
@divine-comedian
Copy link
Collaborator

I think it sounds good, we do maintain a lot of cron jobs and we are planning to add more!

I don't have a good grasp on the amount of work this requires or what kind of work needs to happen.

What would be the definitive PROs and CONs of this change? How many dev hours do we estimate this work would need?

@geleeroyale
Copy link
Collaborator

geleeroyale commented Jul 31, 2024

Look what I found - I will use this to move givfarm-notify jobs from an outdated server

https://github.com/mcuadros/ofelia

Edit: Its a bit of a tough system to set up, but I still like the possibilities. Its more geared towards running cron jobs in docker containers (which was not my use case - I wanted to run cron jobs on the host)

so for refactoring impact-graph jobs out of impact-graph but still use the container this seems to be perfect

@divine-comedian
Copy link
Collaborator

@geleeroyale @mhmdksh what is the update on this issue?

@geleeroyale
Copy link
Collaborator

This is a refactor job for the developers. We are always down to support, but we would be more working with the consequences. The reason why @mhmdksh opened this issue is that @jainkrati asked for ways to improve DApp performance and this issue marks the most important step in preparing impact-graph for horizontal scaling

@aminlatifi
Copy link
Member

@divine-comedian we have implemented this on qacc backend which is a fork of impact graph

Devops team have setup the required env, and has been up since yesterday.

It has higher availability and shorter launch time.

@divine-comedian
Copy link
Collaborator

Great! Thanks @aminlatifi so what I understand is we would need to implement code, which is already written into qacc back-end into the impact-graph.

@mhmdksh
Copy link
Collaborator Author

mhmdksh commented Nov 7, 2024

@divine-comedian Yes exactly. To give more info about how this setup can look like if we implemented this. It will have the following benefits:

  1. Impact graph can run with a High Availability Setup, this means we will run multiple impact-graph backends at the same time that will act as one team of backends.
  2. The traffic for those backends will get distributed among them using a reverse proxy that handles traffic load, and distribute the load to different instances of the running backend, instead of depending on one
  3. Another benefit is that with this implemented, we can have regular health checks and auto-healing process that will auto-heal any faulty backend member if it was, and restart it automatically
  4. The deployments will also benefit as implementing this will give us the ability to update the deployment process to have a Zero-Downtime Deployment for impact-graph, this means, when an update is pushed, all the running instanced of the impact-graph backends get updated gradually and one by one, if the first update failed, then the deployment will stop, and the other working instances of impact graph will Still be Active, which will give the devs the chance to push another working update without the users being effected, since they will already be working with the working instances that didn't get updated
  5. The backend itself can also benefit from now being able to be deployed on multiple regions around the planet, this means when this setup is implemented, and all impact graph instances are awaiting for the right amount of traffic to be called to them, the implementation itself will be Global and it will be faster for all users across the globe, in contrast of having Giveth being faster in the Euro Region, but not all other regions.
  6. We can stop getting the 500 error resulted from a faulty backend :)

CC: @geleeroyale @jainkrati

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops Issues related to Devops kubernetes
Projects
Status: Dev Research
Development

No branches or pull requests

7 participants