-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port builds.sh to Go (huge speed increase) #5
Conversation
(golang/go#27179 is a useful Go link) |
I'd like to get #7 in before this, then rebase this on that -- I'm still considering whether it's worth implementing ordered maps via a custom type so we can get back to the ideal state (I tried a few libraries but didn't really like any of them), but either way that'll help us make sure this is correct (while showing off how much faster it is via the job being faster) and if we decide the |
This takes our current subset from ~16m down to ~6m
…hat doesn't require postprocessing) This takes the current subset calculation from ~6m down to ~10s
…ilds" compilation too
Writing a generic JSON round-tripping "ordered map" implementation that meets our needs was pretty straightforward (less than 100 lines total), so now we don't even have to live with a change to any of our |
…maps in a defined order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing, thanks!
Just a few minor nits and some questions.
"github.com/sirupsen/logrus" // this is used by containerd libraries, so we need to set the default log level for it | ||
) | ||
|
||
var concurrency = 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why var
and not const
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because I moved it up so it'd be more obvious and didn't give it much thought -- I ultimately wanted to make it a CLI flag, but changing it doesn't really change very much (it makes the whole thing slightly faster or slower, but still generally less than a minute so still subject to / limited by the 429s which are based on requests per minute). 😅
|
||
go func() { | ||
// Go does not have ordered maps *and* is complicated to read an object, make a tiny modification, write it back out (without modelling the entire schema), so we'll let a single invocation of jq solve both problems (munging the documents in the way we expect *and* giving us an in-order stream) | ||
jq := exec.Command("jq", "-c", ".[] | (.arches | to_entries[]) as $arch | .arches = { ($arch.key): $arch.value }", sourcesJsonFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the issue with having the JSON sorted to begin with and then always outputting it sorted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order in sources.json
is actually significant (it's what allows us to later assume that a parent referenced by sourceId
is either already calculated or already in-progress so we can either wait for it or have it already), and if we sorted it we'd lose that property. We could change the format of sources.json
to be a list or something, but given that we can have order-preserving and the cost of this is effectively nothing, it doesn't seem worth doing so IMO given that it being a map gives us the benefit of direct lookups by sourceId/buildId (which we then use in our builds).
That also only solves the ordering problem, not the "read, modify in a small way, write back out" problem (we'd have to define a full schema for that sources object, which we'll need eventually if we port sources.sh
into Go as well, but we don't need right now for the current problem).
} | ||
for build.Build.Arch = range source.Arches { | ||
// I really hate Go. | ||
// (just doing a lookup of the only key in my map into a variable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps soon, https://pkg.go.dev/golang.org/x/exp/maps#Keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! I was angry that https://pkg.go.dev/maps didn't include that (yet), but perhaps there is still hope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, those functions being included in stdlib is blocked on the "iterators" proposals (which are blocked on "for range
over a function" that was accepted but still experimental and requires GODEBUG
incantations to use), but an iterator would honestly be even better for this use case than a slice of keys so I guess I'll sit down and try to be patient. 😂
"$dir/.go-env.sh" go build -v -o builds builds.go | ||
ls -l "$dir/builds" | ||
} >&2 | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makefile
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it, but I'm not sure it actually makes this much simpler since it still has to list all the files that should cause the binary to rebuild, and it's really creating the container (via .go-env.sh
) that I was trying to optimize not doing unless necessary, not the go build
(which is otherwise really fast and I've have no problem just running unconditionally every time). I also don't want to encode an assumption of make
being available on the host in scripts like this (as a general rule), so even with a Makefile
I'd still be invoking it via .go-env.sh
and be back to the same problem. 🙈
Instead of maintaining another file that then splits the logic for these things, I'd probably instead go the other direction and just pay the cost of the container overhead on every startup (since it should typically only be real overhead the first time, when it has to pull the image).
if i > 0 { | ||
buf.WriteByte(',') | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
) | ||
|
||
func assert[V comparable](t *testing.T, v V, expected V) { | ||
t.Helper() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not know about Helper()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, sorry for the hassle
I started by doing a naive port and got a very rough ~2x increase (tens of minutes down to just minutes), and then added a bit of parallelism (still roughly the same overall implementation -- if you want to review that commit, I'd suggest doing so with ignored whitespace) and managed to get down to seconds. It does run into 429s periodically, which I think is probably expected (and could be mitigated by further caching the
oisupport/staging-XXX:YYY
tag lookups more aggressively, since they will only change if we're deleting them for an explicit cache bust / rebuild). Additionally, this preserves the output order.I've done a before/after on the generated file, and all the differences are what I'd expect -- the calculated buildIds fordrupal
andnotary
change, which will cause them to rebuild. That happens because the old implementation was able to be order-preserving on parent -> digest mappings that get included in the buildId calculation, where Go doesn't have order-preserving maps so we get sorted parent keys instead (and thus a few new buildIds, but mostly 100% unchanged). For builders that still preserve build cache (everything not on GHA), those should even be fully cached builds.(added a simple "ordered map" implementation that avoids
buildId
changes)