Feature Enhancement: Minimize duplicated recorded tests #562

dgkf · 2024-03-12T19:01:04Z

Supersedes #530, Closes #528

The previous PR never got picked up and the repo has moved forward since then. This PR just rebases the changes onto the latest work in covr. Copying the original description below:

Refer to #528 for a detailed description of the issue, but in short the heuristic that was used for quickly determining whether we're evaluating a new test ended up being too specific, resulting in the same test being logged many times.

Generally this isn't an issue. Even in situations where a test suite might loop over a test, the heuristic would usually treat those as the same test. Even if it was logged with each iteration, those iteration counts are usually low enough that it doesn't impact the coverage object substantially. However, when testing fastmap, there were a couple cases where an expectation is made in a loop with 10s of thousands of iterations and logging the test with each evaluation, blowing up the coverage object memory needs.

To emphasize, these changes will only affect behaviors when options(covr.record_tests = TRUE), which must be opted into.

Technical details annotated as in-line comments below:

dgkf · 2024-03-12T19:01:53Z

R/trace_tests.R

-#' #      test depth i
-#' # [1,]    1     2 4
+#' #      test call depth i
+#' # [1,]    1    1     2 4


A change to the structure of the <coverage>[[<key>]]$tests elements.

Adds a call column, which is the number of times the test expression was evaluated before hitting this trace. It's not used within covr, but since the same test now represents multiple calls it is useful for distinguishing test evaluations useful for downstream tooling.

dgkf · 2024-03-12T19:02:19Z

R/trace_tests.R

-  .current_test$src_env <- sys.frame(which = .current_test$last_frame)
+  .current_test$src_env <- sys.frame(which = .current_test$last_frame - 1L)


Using the calling frame instead of the evaluation frame is one way that this PR cuts down on unnecessary duplication of tests.

dgkf · 2024-03-12T19:02:59Z

R/trace_tests.R

+current_test_index <- function() {
+  # check if test has already been encountered and reuse test index
+  if (inherits(.current_test$src, "srcref")) {
+    # when tests have srcrefs, we can quickly compare test keys
+    match(
+      .current_test$key,
+      names(.counters$tests),
+      nomatch = length(.counters$tests) + 1L
+    )
+  } else {
+    # otherwise we compare call stacks
+    Position(
+      function(t) identical(t[], .current_test$trace),  # t[] to ignore attr
+      .counters$tests,
+      right = TRUE,
+      nomatch = length(.counters$tests) + 1L
+    )
+  }
+}


If the current test matches an already logged test, then we reuse that index. Otherwise it's added to a growing list of recorded test expressions.

New tests are distinguished by comparing the srcref "key" to any previous test keys if the test code has known srcrefs, or otherwise looking for identical call stacks.

I was expecting call stack comparisons like this to come with a big performance hit, but it ended up being pretty minor.

dgkf · 2024-04-04T20:42:13Z

@jimhester - Any chance I could get your eyes on this?

We've been using this patch for a year and have tested against ~1000 packages at this point without the memory issues we were seeing before.

Briefly, this patch further minimizes duplicated recording of test call stacks. In extreme cases (namely fastmap & rlang, which have tests that hit the same line of code millions of times), this can easily save GBs of memory.

jimhester · 2024-04-07T17:48:20Z

Thanks @dgkf, sorry for the delay in merging!

dgkf · 2024-04-07T17:53:06Z

Thanks, @jimhester! (and no worries - I would have nudged more frequently if it was a major blocker 😉). Thanks as always for covr's maintenance.

dgkf added 6 commits March 12, 2024 14:43

fixing simple case for duplicate test logs in loops

5269a5f

initial fix for duplicate tests

6978f61

clean up de-duplication code

4461b47

debug code cleanup!

500e2d3

rebase on upstream/main

94bdbe7

moving changes to latest dev version NEWS

69b1ea0

dgkf commented Mar 12, 2024

View reviewed changes

jimhester merged commit dd5286d into r-lib:main Apr 7, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Enhancement: Minimize duplicated recorded tests #562

Feature Enhancement: Minimize duplicated recorded tests #562

dgkf commented Mar 12, 2024 •

edited

Loading

dgkf Mar 12, 2024

dgkf Mar 12, 2024

dgkf Mar 12, 2024

dgkf commented Apr 4, 2024

jimhester commented Apr 7, 2024

dgkf commented Apr 7, 2024

		.current_test$src_env <- sys.frame(which = .current_test$last_frame)
		.current_test$src_env <- sys.frame(which = .current_test$last_frame - 1L)

Feature Enhancement: Minimize duplicated recorded tests #562

Feature Enhancement: Minimize duplicated recorded tests #562

Conversation

dgkf commented Mar 12, 2024 • edited Loading

dgkf Mar 12, 2024

Choose a reason for hiding this comment

dgkf Mar 12, 2024

Choose a reason for hiding this comment

dgkf Mar 12, 2024

Choose a reason for hiding this comment

dgkf commented Apr 4, 2024

jimhester commented Apr 7, 2024

dgkf commented Apr 7, 2024

dgkf commented Mar 12, 2024 •

edited

Loading