Ruby: Improve performance of flow through (hash) splats #14229

hvitved · 2023-09-15T09:03:46Z

This PR builds on top of #14090. This DCA run shows the combined effect of those two PRs, while this DCA run is for this PR alone.

Hash splats

We support flow through hash-splats in all possible combinations:

def m1(key1:, key2:); end
args = { key1: a, key2: b }
m1(**args)

def m2(**kwargs); end
m2(key1: a, key2: b)

def m3(key1:, **kwargs); end
args = { key1: a }
m3(key2: b, **args)

The first case (m1) is handled by synthesizing an implicit hash-splat parameter implicit-hash-param inside m1, and then adding read steps out of implicit-hash-param into key1 (resp. key2), reading out a KnownElementContent corresponding to the symbol :key1 (resp. :key2):

def m1(**implicit-hash-param)
  key1 = implicit-hash-param[:key1]
  key2 = implicit-hash-param[:key2]
end

The second case is handled dually: we synthesize an implicit hash-splat argument implicit-hash-arg at the call site, and add store steps from the keyword arguments into implicit-hash-arg:

implicit-hash-arg = { key1: a, key2: b }
m2(**implicit-hash-arg)

The third case is handled using a combination, but care must be taken to ensure that the :key1 argument does not end up in **kwargs (we do this using clearsContent).

Reusing KnownElementContent for the implicit stores/reads makes semantically sense. However, performance-wise it is not great, as we are generating redundant stores/reads for the simple cases

def m4(key1:, key2:); end
m4(key1: a, key2: b)

where we will be able to match the keyword arguments directly against the keyword parameters. Not only are the stores/reads redundant, they also put extra pressure on the data flow computation, as we are generating many more cons-candidates.

In order to avoid this redundancy, we introduce a new type of content, HashSplatContent, which is used for the implicit argument stores (implicit-hash-arg), but which are not allowed in reads out of implicit-hash-param (KnownElementContent reads should still be allowed in order to handle the m1 example). This will effectively remove the redundancy in the m4 example, but in order for reads such as kwargs[:key1] inside m2 to work, such reads should be able to match both KnownElementContents and HashSplatContents, which we achieve using
ContentSet::getAReadContent.

Splats

Unlike hash-splats, where we can simply merge all keywords arguments/parameters with all hash-splat arguments/parameters, flow through splats is limited to methods containing zero or more positional parameters followed by at most one splat parameter, and calls containing zero or more positional arguments followed by at most one splat argument (since we are not tracking the length of arrays):

def m1(p1, p2); end
args = [a, b]
m1(*args)

def m2(*posargs); end
m2(a, b)

def m3(p1, **rest); end
args = [b]
m3(a, *args)

As for hash-splats, the first case (m1) is handled by synthesizing an implicit splat parameter implicit-splat-param inside m1, and then adding read steps out of implicit-splat-param into p1 (resp. p2), reading out a KnownElementContent corresponding to the integer 0 (resp. 1):

def m1(*implicit-splat-param)
  p1 = implicit-splat-param[0]
  p2 = implicit-splat-param[1]
end

The second case is handled dually: we synthesize an implicit splat argument implicit-splat-arg at the call site, and add store steps from the positional arguments into implicit-splat-arg:

implicit-splat-arg = [a, b]
m2(*implicit-splat-arg)

The third case is handled using a combination, but care must be taken to shift indices both at the call site (where a must have index 0 and b must have index 1 inside implicit-splat-arg), and similarly only b must end up in rest with re-shifted index 0.

As for hash-splats, reusing KnownElementContent for the implicit stores/reads makes semantically sense, but introduces the same kind of redundancy and performance issues:

def m4(p1, p2); end
m4(a, b)

The solution is similar: we introduce a new type of content, SplatContent, which is used for the implicit argument stores (implicit-splat-arg), but we record not only the index, but also whether the index was shifted or not (in the m3 example, only b will have a shifted index). Then only SplatContents with shifted indices (as well as KnownElementContents in the m1 example) need to read out of implicit-splat-param, as all other arguments will be able to match directly. Like for HashSplatContent, we also need to ensure that reads such as rest[0] inside m2 work, and we make use of ContentSet::getAReadContent for this as well.

ruby/ql/lib/codeql/ruby/ApiGraphs.qll

ruby/ql/lib/codeql/ruby/dataflow/internal/DataFlowPrivate.qll

github-actions bot added the Ruby label Sep 15, 2023

hvitved force-pushed the ruby/splat-flow-performance branch 14 times, most recently from ad1d9a3 to 46cf10b Compare September 18, 2023 14:01

github-advanced-security bot found potential problems Sep 18, 2023

View reviewed changes

ruby/ql/lib/codeql/ruby/ApiGraphs.qll Fixed Show fixed Hide fixed

ruby/ql/lib/codeql/ruby/ApiGraphs.qll Fixed Show fixed Hide fixed

hvitved force-pushed the ruby/splat-flow-performance branch 2 times, most recently from f582dbd to 5ae4441 Compare September 18, 2023 18:44

github-advanced-security bot found potential problems Sep 18, 2023

View reviewed changes

ruby/ql/lib/codeql/ruby/dataflow/internal/DataFlowPrivate.qll Fixed Show fixed Hide fixed

hvitved force-pushed the ruby/splat-flow-performance branch 4 times, most recently from 0f4fe28 to 12ceca8 Compare September 25, 2023 09:23

hvitved added the no-change-note-required This PR does not need a change note label Sep 25, 2023

hvitved marked this pull request as ready for review September 25, 2023 13:02

hvitved requested a review from a team as a code owner September 25, 2023 13:02

calumgrant requested a review from hmac September 25, 2023 13:16

hmac previously approved these changes Sep 27, 2023

View reviewed changes

Ruby: Improve performance of flow through (hash) splats

c570083

hvitved dismissed hmac’s stale review via c570083 September 27, 2023 09:49

hvitved force-pushed the ruby/splat-flow-performance branch from 12ceca8 to c570083 Compare September 27, 2023 09:49

hmac approved these changes Sep 27, 2023

View reviewed changes

hvitved merged commit 56f8d5d into github:main Sep 27, 2023
19 checks passed

hvitved deleted the ruby/splat-flow-performance branch September 27, 2023 12:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ruby: Improve performance of flow through (hash) splats #14229

Ruby: Improve performance of flow through (hash) splats #14229

hvitved commented Sep 15, 2023 •

edited

Loading

Ruby: Improve performance of flow through (hash) splats #14229

Ruby: Improve performance of flow through (hash) splats #14229

Conversation

hvitved commented Sep 15, 2023 • edited Loading

Hash splats

Splats

hvitved commented Sep 15, 2023 •

edited

Loading