Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use parallel processing #140

Closed
wants to merge 1 commit into from
Closed

Use parallel processing #140

wants to merge 1 commit into from

Conversation

kjmeagher
Copy link
Member

This changes the server to send a list of several p-frames to the clients.
The clients process the frames in parallel using python's multiprocessing
module. The tables used by photospline are shared between the processes.
The number of parallel trays can be controlled with the new option --parallel-trays

This changes the server to send a list of several p-frames to the clients.
The clients process the frames in parallel using python's multiprocessing
module. The tables used by photospline are shared between the processes.
The number of parallel trays can be controlled with the new option --parallel-trays
Copy link
Member

@ric-evans ric-evans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of parallelizing within a client, in terms of empirical performance? Is the main lag in the icetray IO or computational? I've assumed it's computational. With the ewms-pilot framework, if a single one of these "grouped" pixels caused an exception or the pixel "group" lands on a bad worker node, then the whole "group" will need to be auto-re-enqueued instead of a single pixel.

@ric-evans ric-evans requested a review from dsschult March 1, 2023 22:46
@dsschult
Copy link
Member

dsschult commented Mar 1, 2023

This should work. I'm just not terribly happy with doing multiple pixels per message, as it goes somewhat against the long-term vision for EWMS. I'd be happier with the ewms-pilot processing multiple messages in parallel. (at least, as long as the message processing time was >> 1 second, so it's actually compute bound)

I assume it's using shared memory for the tables, and has the potential problem of boost::interprocess screwing up the shared memory lock? Is there any protection on that not creating a black hole node?

@ric-evans
Copy link
Member

ric-evans commented Mar 1, 2023

I think parallelizing the client would be better accomplished by the ewms-pilot. Here, each client would create multiple processes each with an icetray for an arbitrary number of messages. This is currently in the planning stage.

@dsschult dsschult requested a review from briedel March 1, 2023 22:49
@dsschult
Copy link
Member

dsschult commented Mar 1, 2023

Added @briedel as he asked for this.

@ric-evans
Copy link
Member

ric-evans commented Mar 1, 2023

Worth pointing out: packing too much into a single message will eventually throttle the broker

Similarly, this could slow down the overall performance since we're waiting for all the pixels in a group to finish before sending the pixelrecos

@briedel
Copy link
Collaborator

briedel commented Mar 1, 2023

We are requesting 5 GB of RAM per pixel right now. This is really inefficient if we want to run this on the grid or expensive if we run this in the cloud and running photospline is shared memory mode is really finicky on the client side. I rather have this inside icetray to keep the implementation details away from the pilot beyond “batching” events together somehow.

@kjmeagher
Copy link
Member Author

Based on the specs I got from @briedel, I identified three possible ways to accomplish parallel processing:

  1. Have skymap_scanner.client allocate the tables then call reco_icetray.reco_pixel() directly with subprocess
  2. Have skymap_scanner.client use IPC to pass multiple pframe packets to reco_icetray
  3. The solution in this PR

I went with 3 because it seemed like the easiest to implement. I don't know what ewms-pilot is, so I didn't consider it when choosing a solution. I would need more information to asses which one is best.

Also note that using subprocess for multiple trays is exactly the same as how triggered corsika works and so far we haven't had any issues with shared memory locks

Also note that the current main branch calls recos.get_reco_interface_object(reco_algo) twice which results in allocating the spline tables twice

Copy link
Member

@dsschult dsschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving a proper review of the changes, I think the code generally looks fine. I'm not sure if we want the default number of subprocesses to be 8 or 1 though.


# loop over the pframes to run in parallel
for i, pframe in enumerate(pframes):
outpath = Path(f"out{i:04}.pkl")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this use a more unique path, either using tempfile or maybe with the parent process pid? I'm thinking of the possibility of running two instances side by side, which could happen in EWMS.

@ric-evans
Copy link
Member

Also note that the current main branch calls recos.get_reco_interface_object(reco_algo) twice which results in allocating the spline tables twice

@kjmeagher thanks for finding this. This should be optimized to one allocation, definitely

@ric-evans
Copy link
Member

ric-evans commented Mar 2, 2023

@kjmeagher we talked this over and I think we've found a way to do this parallelization that's also friendly with message passing. If you wrap https://github.com/icecube/skymap_scanner/blob/main/skymap_scanner/client/client.py#L83-L96 with multiple async-loop calls, then the ewms-pilot will make a subprocess for each pilot. Each pilot processes a stream of messages (each with a single pixel & icetray). Note, you will need to give unique names to each pilot's in/out files, see https://github.com/icecube/skymap_scanner/blob/main/skymap_scanner/client/client.py#L73-L74.

The --parallel-trays CL arg would be given to the client instead of the server.

This solves the slow-pixel problem by removing the grouped-pixels constraint. It doesn't solve the bad-node problem or the boost's potential shared-memory lock problem, but hopefully, the tradeoff is worth it.

So, start_scan.py & reco_icetray.py can be unchanged from main. Though I do like your get_reco_interface_object-related optimizations.

@kjmeagher
Copy link
Member Author

@ric-evans how does your plan propose to share memory? Will skymap_scanner.clinet allocate the memory? If so, wouldn't be easier to call subprocess.Process to spawn the trays?

@ric-evans
Copy link
Member

@ric-evans how does your plan propose to share memory? Will skymap_scanner.clinet allocate the memory? If so, wouldn't be easier to call subprocess.Process to spawn the trays?

Right, this problem is specific to a particular reco, and we want to keep the scanner client generalized to support other styles of recos in the future. I think this is better suited to be solved when the tray is initiated, rather than in client.py. Are there shared memory tools within the photonics service we can use? Something that uses /dev/shm?

@tianluyuan
Copy link
Contributor

tianluyuan commented Mar 2, 2023

Also note that the current main branch calls recos.get_reco_interface_object(reco_algo) twice which results in allocating the spline tables twice

@kjmeagher thanks for finding this. This should be optimized to one allocation, definitely

Yes, good catch. The cascade_service can be moved inside the traysegment I think and that should fix it.

@tianluyuan
Copy link
Contributor

@ric-evans how does your plan propose to share memory? Will skymap_scanner.clinet allocate the memory? If so, wouldn't be easier to call subprocess.Process to spawn the trays?

Right, this problem is specific to a particular reco, and we want to keep the scanner client generalized to support other styles of recos in the future. I think this is better suited to be solved when the tray is initiated, rather than in client.py. Are there shared memory tools within the photonics service we can use? Something that uses /dev/shm?

I think it already is using /dev/shm and iirc I thought we were able to get that working with the containers. @briedel are jobs still hanging for you on the cloud?

@briedel
Copy link
Collaborator

briedel commented Mar 2, 2023

They are not hanging anymore, but this with special settings for /dev/shm and that things aren't shared between containers in a pod and the underlying node, i.e. we are turning the sharing off for all intents and purposes. I am not sure what will happen once we don't have that. As long as it is inside a container it should be fine, but if it is inside a pod (multiple containers) I am not sure.

@kjmeagher
Copy link
Member Author

Right, this problem is specific to a particular reco, and we want to keep the scanner client generalized to support other styles of recos in the future. I think this is better suited to be solved when the tray is initiated, rather than in client.py. Are there shared memory tools within the photonics service we can use? Something that uses /dev/shm?

I don't know much about the internals of photospline so maybe @cnweaver or @jvansanten could say if there is a way to access the same instance in photospline in multiple different processes

@cnweaver
Copy link

cnweaver commented Mar 2, 2023

if there is a way to access the same instance in photospline in multiple different processes

Yes, that was the entire point of the photospline v2 rewrite of the evaluation interface. I3PhotoSplineService should do this automatically, which is what was causing the hangs in containers, which are now fixed only by configuring the containers to have not-actually-shared shared memory. This is something that containers pretty much are designed to make difficult, so if people insist on using them, I don't think there's anything we can do/improve from the photospline/photonics side (in terms of enabling sharing; there are some race conditions which are exceedingly difficult to fix and can also cause hangs.)

@ric-evans
Copy link
Member

if there is a way to access the same instance in photospline in multiple different processes

Yes, that was the entire point of the photospline v2 rewrite of the evaluation interface. I3PhotoSplineService should do this automatically, which is what was causing the hangs in containers, which are now fixed only by configuring the containers to have not-actually-shared shared memory. This is something that containers pretty much are designed to make difficult, so if people insist on using them, I don't think there's anything we can do/improve from the photospline/photonics side (in terms of enabling sharing; there are some race conditions which are exceedingly difficult to fix and can also cause hangs.)

Thanks, @cnweaver. So, as long as we're running our multiple processes within the same container, will photospline take care of shared memory between these processes?

@cnweaver
Copy link

cnweaver commented Mar 2, 2023

I think that should work, providing the container-isolated shared memory is turned on. However, I don't think anyone has yet tried it.

@tianluyuan
Copy link
Contributor

They are not hanging anymore, but this with special settings for /dev/shm and that things aren't shared between containers in a pod and the underlying node, i.e. we are turning the sharing off for all intents and purposes. I am not sure what will happen once we don't have that. As long as it is inside a container it should be fine, but if it is inside a pod (multiple containers) I am not sure.

Does python multiprocessing somehow get around this containerization? My impression was that it wouldn't, but I don't have a sense of how pods work on the cloud.

@ric-evans
Copy link
Member

Using ewms-pilot's multitasking will enable parallelism while retaining the paradigm of loosely-coupled pixels, see #194

@ric-evans ric-evans closed this May 11, 2023
@ric-evans ric-evans deleted the parallel_trays branch September 13, 2023 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants