AttributeStore memory tweaks #583

cldellow · 2023-11-17T05:42:02Z

This fixes a bug where:

obj:Attribute('name', 'some value')
obj:Attribute('name', 'some other value')

sometimes resulted in the first value being saved, sometimes the second. Now the last value always wins.

There are also some memory reductions: about 2,300MB less memory is needed for GB. This comes from two locations:

AttributeStore - 1,300MB
OsmMemTiles - 1,000MB

The PR does a few things.

Rather than store a whole std::string in AttributePair to identify the key (name:latin, kind, iata, etc), it identifies the key by a uint16_t. This limits Tilemaker to only 64K keys -- probably fine? Shortbread uses about 50, for reference.
Rather than store a whole AttributePair in an AttributeSet, it references them by a vector of uint32_t. This limits Tilemaker to 4B AttributePairs. This might be an issue -- but it's easily extended to more if needed. I see elsewhere that Tilemaker is limited to 1B AttributeSets -- probably if one has an issue, the other will, too.
I lied, AttributeSet only sometimes uses an actual vector of uint32_ts -- most of the time, a vector is overkill, as items only have a handful of attributes. Instead, AttributeSet typically uses the 24 bytes that a vector would occupy to store an array of 4 shorts and 4 ints†. Only if this array gets exhausted does it change to using a vector.
vector_tile::Tile_Value turns out to be quite big -- it's 96 bytes. I replaced it with a union of std::string, float and bool. I didn't notice any negative impact to PBF generation at the end from newing it up on a per-layer basis.
Recognize when a user calls Layer with the same geometry, and avoid processing/storing the geometry a second time. For example, when a profile writes a geometry for a river as well as for its name.

† The system tries to assign IDs that would fit in a short to those AttributePairs that it thinks will be very popular, like rank=1, amenity=toilets, indoor=false and so on. This lets most AttributeSets store their pairs without having to allocate a vector.

I think AttributeStore lives forever, and AttributeSets are immutable once added to it, so we can avoid the copy.

There are relatively few unique key values for attributes, e.g. `kind`, `name`, `admin_level`. The Shortbread schema has only ~50 or so. I imagine OMT is similar, but haven't checked. We generate lots of AttributePairs -- on the order of tens of millions for GB, and std::string has an overhead of 32 bytes. By using a string pool and storing only an offset into it, we can save a few hundred MB of RAM.

This is the groundwork for implementing two future improvements: - hot/cold pairs: there is a bimodal distribution of attribute frequency. `landuse=wood`, `tunnel=0` are often duplicated. `name=Sneed's Seed & Feed` is not. In the future, we'll try to re-use the "hot" pairs to avoid paying the cost of an AttributePair for them. - "short vectors" - similar to the short string optimization, we should be able to pack up to 6 pairs (3 hot, 3 cold) in the overhead that a vector would otherwise use. As it stands, this commit increases memory usage. But we'll claw a lot of it back, and then some.

If a pair looks like it might be re-usable, put it in a special shard and be able to re-use it. The special shard is limited to max 64K items, teeing up future work to have a simple vector for AttributeSets with few pairs.

The stats I was looking at were counting AttributePairs via AttributeSets, which of course presents a misleading image of how many duplicate AttributePairs there are, because by that point, they've already been deduped. De-duping doesn't add that much runtime overhead--and it could probably be improved by someone who knows more C++ concurrency tricks than me.

`Tile_Value` is a really memory-expensive object. Since we maintain long-lived references to the canonical AttributePair, we can store pointers to save a bit more memory. Now that value->AttributePairs are guaranteed to be 1:1, we can do our debug statistics on ints, and translate to pairs only when writing to stdout.

Doesn't appreciably affect runtime, saves a bit of memory.

Now that there is a 1:1 mapping between values and AttributePairs, it's trivial to compute the hash on demand.

Also const-ify a few things

Tile_Value is a big union that takes up 96 bytes, but for our purposes, we're happy with a union of string, float and bool -- which can be expressed in 28 bytes. We need a discriminator variable, but due to alignment, that's free. I also consider `boost::variant<bool, float, string>`, but it seemed to take 40 bytes. I worried that not having a pool of Tile_Values would affect PBF writing time, but it seems unaffected.

This is useful for ranks, which run from 1..25

`vector<uint32_t>` takes 24 bytes just to store its internal pointers. If you actually want to store a `uint32_t` in it, it'll then allocate some memory on the heap, taking a further 32-64 bytes depending on STL and malloc implementations. 56-88 bytes! For a single `uint32_t`! Outrageous. Instead, store references to pair indexes in an array of shorts. If the pairs don't fit in the array, upgrade it to a vector. Since we previously arranged for very popular pairs like `amenity=toilets` to have small indexes, our array of shorts is capable of storing between 4 and 8 pairs before we need to upgade to a vector. Most AttributeSets will not need to use a vector.

AttributeKeyStore/AttributePairStore have the same lifetime as AttributeStore, so just make them owned by it. This results in slightly more convoluted code, but avoids having them floating around as globals.

x ^ y will only use as many bits as max(x, y), but tiles only use the full 32-bit space at z16, so we're leaving a lot of the hash space on the table.

Previously, if you set the same key to different values, it was not guaranteed that the last value written would win.

set seems a bit like overkill - we already know the items are unique, and the consumer is likely just going to iterate over them

also avoid hardcoding 12

This reverts commit 7570737. Oops, I think this change isn't meaningful, and is a result of me misreading the original code. It might still be an improvement to do something like `hash(x << 16) ^ hash(y)`, since the default TileCoordinate is only 16 bits, but that can be considered independent of this PR.

systemed · 2023-11-17T16:34:26Z

This looks really good - thank you. I'll have a proper look at it this weekend. I like the optimisation for popular pairs - very clever.

They're long-lived, so pass pointers

I'm slowly remembering how to write concurrent code...

This should reduce futex contention significantly. I'll apply the same change for AttributePairStore's shard 0, then measure.

Per https://stackoverflow.com/questions/36320008/whats-the-default-value-for-a-stdatomic, they aren't initialized by default. Somewhat surprised this didn't result in crashes.

A common pattern is: ```lua way:Layer("waterway", false) ... way:Layer("waterway_names", false) ``` Previously, we'd process the geometry twice, and store a second copy of it in memory. Instead, re-use the previously stored geometry. This saves another ~1GB of memory for the GB extract. It doesn't seem to affect runtime - I think we only re-use linestrings, and linestrings are relatively cheap to do `is_valid` on. It seems like with the rest of the work on this branch, the `OutputObjectXyz` classes are very thin -- inspecting `geomType` in order to construct the right was a bit tedious, so I removed them.

systemed · 2023-11-19T17:59:09Z

Great Britain:

master: 7:21.83, 14.5GB
refactor_geometries: 7:00.72, 14GB
this PR: 6:33.40, 11.5GB

Germany:

master: 13:56.06, 30.1GB
master with --store: 15:06.89, 28.1GB
this PR: 12:29.84, 23.7GB
this PR with --store: 12:54.88, 21.6GB

That's an amazing improvement. Thank you!

cldellow added 27 commits November 16, 2023 17:11

make AttributeStore::get const

697b1cf

I think AttributeStore lives forever, and AttributeSets are immutable once added to it, so we can avoid the copy.

Have a "hot" shard for popular pairs

a80dd09

If a pair looks like it might be re-usable, put it in a special shard and be able to re-use it. The special shard is limited to max 64K items, teeing up future work to have a simple vector for AttributeSets with few pairs.

treat 0 as a sentinel

66adb1d

use boost::container::flat_map over std::map

1f08a81

Doesn't appreciably affect runtime, saves a bit of memory.

don't memoize hash function

39f1c43

Now that there is a 1:1 mapping between values and AttributePairs, it's trivial to compute the hash on demand.

output_object: avoid Tile_Value temporaries

2641612

Also const-ify a few things

adjust headers, remove unneeded rng

60c36f3

any integer 0 <= 25 is eligible for hot pool

f06c2ac

This is useful for ranks, which run from 1..25

simplify AttributeKeyStore

13f72ae

use camelCase

1f43dbc

re-write to avoid static lifetime

23b6a90

AttributeKeyStore/AttributePairStore have the same lifetime as AttributeStore, so just make them owned by it. This results in slightly more convoluted code, but avoids having them floating around as globals.

reduce lock contention

a4b55d3

Improve TileCoordinates hash function

7570737

x ^ y will only use as many bits as max(x, y), but tiles only use the full 32-bit space at z16, so we're leaving a lot of the hash space on the table.

d'oh, avoid looking up the key name needlessly

8ddf6de

change AttributeXyz(...) to be last-written wins

f9a1931

Previously, if you set the same key to different values, it was not guaranteed that the last value written would win.

remove misleading comment

f9ccf3c

include deque

6ba6db2

include map

2f61bc9

return vector, not set

3e6df15

set seems a bit like overkill - we already know the items are unique, and the consumer is likely just going to iterate over them

avoid GNU-specific initializer

9436324

also avoid hardcoding 12

cldellow added 2 commits November 17, 2023 20:28

remove dead code

063bc3c

avoid copying AttributePairs

6f31fc3

They're long-lived, so pass pointers

cldellow added 5 commits November 17, 2023 22:02

OutputObjects - greatly reduce need for locks

557a077

I'm slowly remembering how to write concurrent code...

AttributeKeyStore: use a TLS cache

75ff493

This should reduce futex contention significantly. I'll apply the same change for AttributePairStore's shard 0, then measure.

AttributePairStore: reduce lock contention

de9e4d6

ensure atomics are initialized

a285c93

Per https://stackoverflow.com/questions/36320008/whats-the-default-value-for-a-stdatomic, they aren't initialized by default. Somewhat surprised this didn't result in crashes.

systemed merged commit 3b3b8f1 into systemed:refactor_geometries Nov 19, 2023
5 checks passed

cldellow mentioned this pull request Nov 30, 2023

Remove output object ref #595

Merged

cldellow mentioned this pull request Feb 2, 2024

MAC OS issue with types during the make command #664

Closed

cldellow mentioned this pull request Sep 9, 2024

Bug: SegFault caused by invalid string #750

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeStore memory tweaks #583

AttributeStore memory tweaks #583

cldellow commented Nov 17, 2023 •

edited

Loading

systemed commented Nov 17, 2023

systemed commented Nov 19, 2023

AttributeStore memory tweaks #583

AttributeStore memory tweaks #583

Conversation

cldellow commented Nov 17, 2023 • edited Loading

systemed commented Nov 17, 2023

systemed commented Nov 19, 2023

cldellow commented Nov 17, 2023 •

edited

Loading