-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shard the node store to reduce memory usage #571
shard the node store to reduce memory usage #571
Conversation
It looks like PR systemed#499 adopts 36 bits as the max size of an OSM ID. The NodeStore currently uses a full 64 bits for these IDs. This PR changes it to shard the nodes across 16 collections (4 bits) and then store only the last 32 bits in the collection itself. This reduces memory usage for the NodeStore by 25%, without much impact on runtime. The CompactNodeStore is still much better, as it has no overhead and constant time lookups -- but I'm often lazy and not using a renumbered PBF file.
A similar improvement could be made for the way store, and is perhaps worth doing. My intuition was that there are 9x as many nodes as ways, and all nodes are stored during processing, whereas only some ways (but perhaps most in practice?) are stored during processing. The WayStore also doesn't (yet) show up in my massif profiles. On the other hand, an improvement for ways would also improve |
Neat idea. I'll do a bit of local testing but this looks good - thanks! The current max OSM way ID is somewhere in the region of 1200000000, i.e. not far above 2**30. So we could probably move WayStore IDs to plain unsharded 32-bit, at least for now. |
Oh! That's attractive for how easy it'd be to implement. Looking at https://wiki.openstreetmap.org/wiki/Stats#Accumulated_ways_and_relations, it looks like there are ~1B ways and they're added at a roughly steady rate of 100M/year, so plenty of headroom. Looking at my usual test cases, the ratio of stored ways to nodes seems actually much worse than 1:9:
So whereas the NodeStore change saved 324MB of RAM in the Kentucky case, a similar change for WayStore would only save 0.5MB, I think. Still not a bad idea, but I'll go fishing for some higher payoff things first. |
Oops, just noticed that I botched extracting this PR from my hacked-up development tree. Will fix it sometime this week and push an update. (I'm pretty sure https://github.com/systemed/tilemaker/pull/571/files#diff-935619b0dbb4696e9b8a02fa69ff07857ddda6efb35289406eb62a766b3b3087R83 ought to be using |
I had a go last night and it increased the memory usage for great-britain-latest.osm.pbf so that might explain it 😊 No problem - as and when. |
Also fix the way entries are inserted into the deques - I'd really done a good job of screwing up the PR. :)
Great Britain extract:
Germany extract:
So on balance this looks like a positive change! |
Great to hear! I hope to send a separate PR with some unrelated performance changes that should claw back the runtime cost that was lost here. |
I've done a little bit more testing and this is working brilliantly - thanks again! |
It looks like PR #499 adopts 36 bits as the max size of an OSM ID.
The NodeStore currently uses a full 64 bits for these IDs. This PR changes it to shard the nodes across 16 collections (4 bits) and then store only the last 32 bits in the collection itself.
This reduces memory usage for the NodeStore by 25%, without much impact on runtime.
The CompactNodeStore is still much better, as it has no overhead and constant time lookups -- but I'm often lazy and not using a renumbered PBF file.