Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Distributed Workers #371

Open
Zibbp opened this issue Feb 17, 2024 · 7 comments
Open

Question: Distributed Workers #371

Zibbp opened this issue Feb 17, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@Zibbp
Copy link
Owner

Zibbp commented Feb 17, 2024

I hinted at the possibility of having distributed workers. This would mean a worker container can be hosted, say a different server, and perform all queue tasks, or limit that worker to heavy-duty tasks such as the chat render. It could also help spread the load of the chat render task if you have many archives going at once. At the very least, it would be beneficial to get the worker and server out of the same container as it is now. Having separate containers for the two makes it much easier. The worker container would require access to the following:

  • Your /vods mount
  • Database
  • Temporal
  • A new gRPC connection for server <-> worker communications (basically another port to expose in the api container)

Is there any interest in the community for this feature?

I've also been thinking of going away from the /tmp directory that all files are downloaded into. Instead, all download files would be placed in their final destination, /vods/channel/id/..., rather than /tmp. This has some pros and cons. An obvious pro being if the container gets restarted, the previous task's data can be pulled from the persistent /vods directory. A con being it would result in the large video being copied over twice. Once after the live stream download finishes, and another after the post-process video task is complete. Any opinions about this?

@Zibbp Zibbp added the enhancement New feature or request label Feb 17, 2024
@Zibbp Zibbp pinned this issue Feb 17, 2024
@russelg
Copy link

russelg commented Feb 18, 2024

I can't say I have a use case for distributing the workers, mostly because I don't use the chat rendering. However if splitting the worker(s) into a different container makes things easier, then I don't see an issue with it.

For the /tmp directory matter, I store all my VODs in Backblaze B2 and use rclone for mounting the folder (I had issues with vod segments going missing during copy using goofys/s3fs). In this case, the temporary video files being temporarily uploaded to B2 seems like a waste. I have a mount for /tmp so the previous task's data should be able to be persisted and restored without a problem.

@Aerglonus
Copy link

For the workers being distributed to different containers i don't have an issue since i don't use the chat rendering, as russelg said if this makes things easier i don't mind.

As for this:

I've also been thinking of going away from the /tmp directory that all files are downloaded into. Instead, all download files would be placed in their final destination, /vods/channel/id/..., rather than /tmp. This has some pros and cons. An obvious pro being if the container gets restarted, the previous task's data can be pulled from the persistent /vods directory.

Isn't this happening already if you mount a /tmp directory from the host ?. The only thing is that is just not organized and what you suggest will do that.

Why not just make it a requirement instead of being optional like it is now ?. Of course instead of using the /tmp directory tell the user to mount a specific directory for the task data and organize everything according to channel. for example ./tasks/channel/taks_id/data and then move everything to the final destination /vods/channel/....

Also

A con being it would result in the large video being copied over twice. Once after the live stream download finishes, and another after the post-process video task is complete

Why is this? I thought the video download and convert were done in place /tmp and only moved the final file to the final destination. Or i understood this wrong?

@russelg
Copy link

russelg commented Feb 19, 2024

Why is this? I thought the video download and convert were done in place /tmp and only moved the final file to the final destination. Or i understood this wrong?

That's exactly what Zibbp is proposing changing here. If these changes are made, /tmp will never be used, and the final vod directory (/vods/channel/....) will be used for the temp files while downloading/converting as well.

@Zibbp
Copy link
Owner Author

Zibbp commented Feb 19, 2024

Why is this? I thought the video download and convert were done in place /tmp and only moved the final file to the final destination. Or i understood this wrong?

That's exactly what Zibbp is proposing changing here. If these changes are made, /tmp will never be used, and the final vod directory (/vods/channel/....) will be used for the temp files while downloading/converting as well.

Correct. If I did implement distributed workers, worker B might need the downloaded video file from worker A, which is why I suggested placing temp files in the final /vods directory. It seems like I can force a workflow to run on a single worker so this likely wouldn't need to happen if I do implement distributed workers (though it would restrict flexibility that all steps of the video archive needs to happen on that specific worker).

Why not just make it a requirement instead of being optional like it is now ?. Of course instead of using the /tmp directory tell the user to mount a specific directory for the task data and organize everything according to channel. for example ./tasks/channel/taks_id/data and then move everything to the final destination /vods/channel/....

That's a good idea. I shouldn't be storing these files in the filesystem's /tmp directory. A more suitable directory that is mounted on the host makes sense as a strong recommendation.

@Shockkota
Copy link

Amusing that I see this thread when I've been considering a worker machine for a different reason.

A personal use case would be running an instance in the cloud to monitor and grab the streams/vods (for availability and uptime purposes.) and then pull those down to a local instance where my archival storage resides.

The possible headache I have not quite figured out with that yet will be migration the stream from the cloud instance to a local one. I would love for a way to output/ingest a file vs having to share my archival streams/vods mount.

@Zibbp
Copy link
Owner Author

Zibbp commented Apr 10, 2024

A personal use case would be running an instance in the cloud to monitor and grab the streams/vods (for availability and uptime purposes.) and then pull those down to a local instance where my archival storage resides.

Interesting setup. A separate worker and server setup probably wouldn't solve what you're trying to accomplish, if you don't want to share your vod mount.

You could setup a similar setup today. Run one instance in the cloud and another locally. The local instance could rsync downloaded vods over SSH. The vods sql table would also need to be dumped and imported on the local instance periodically.

@Shockkota
Copy link

Interesting setup. A separate worker and server setup probably wouldn't solve what you're trying to accomplish, if you don't want to share your vod mount.

You could setup a similar setup today. Run one instance in the cloud and another locally. The local instance could rsync downloaded vods over SSH. The vods sql table would also need to be dumped and imported on the local instance periodically.

Yep, its the sql side of it I haven't looked into enough to work out yet. It would be cool if ganymede could periodically scan the vods folder and import vods that matched its formatting. Similarly to adding media to plex or jellyfin.

@Zibbp Zibbp unpinned this issue Nov 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants