segfault recently in codecov() #567

MichaelChirico · 2024-05-01T06:01:17Z

We are observing a segfault in codecov recently, e.g.

https://github.com/Rdatatable/data.table/actions/runs/8885891976/job/24398250857?pr=6107

Unfortunately I haven't been able to reproduce it outside CI, but it's pretty persistent across all commits in the past 2 days or so:

https://github.com/Rdatatable/data.table/actions/workflows/test-coverage.yaml

The stack trace is into try(), but I am not sure how to proceed from here, any ideas?

The text was updated successfully, but these errors were encountered:

mcol · 2024-08-27T07:10:40Z

Your last green run on master was on 24 April, which is when R 4.4.0 was released. Something must have changed in that release that is negatively affecting covr. In R-Lum/Luminescence#144 we've also attempted to debug our incredibly long coverage times, and we reached the conclusion that they started with R 4.4.0: we didn't see any segfault, but coverage times have increased by at least 10x. We're now running coverage on R 4.3.3 without issues.

struckma · 2024-08-28T10:27:48Z

Your last green run on master was on 24 April, which is when R 4.4.0 was released. Something must have changed in that release that is negatively affecting covr. In R-Lum/Luminescence#144 we've also attempted to debug our incredibly long coverage times, and we reached the conclusion that they started with R 4.4.0: we didn't see any segfault, but coverage times have increased by at least 10x. We're now running coverage on R 4.3.3 without issues.

Indeed, I have observed also some connection with the R version 4.4.0, but I could not really identify the reason. I did also dig in R's C-sources, but there is nothing really obvious. The NEWS.md mentions some changes around I/O, and there was a change in the serialization code.

I could not reliably reproduce the issue in a toy example, but it seems to be related to containerization, parallel test execution and maybe to the way, R tests are started (using the code = {} argument of covr::package_coverage(), e.g.

mcol · 2024-09-14T16:38:17Z

I've bisected the R changes between 4.3.3 and 4.4.0 (using wch's R-source mirror), and I identified this commit as the one where the problems started. This refers to Bugzilla PR 16756, which indeed mentions covr as being related.

Commenting out these two lines (added in that commit) on the current trunk fixes the issue we've been seeing in R-Lum/Luminescence#144 (comment):
https://github.com/wch/r-source/blob/e391fe166e7af8991b36c4d4f124bf091aa3c702/src/library/utils/R/sourceutils.R#L153-L154

Unfortunately, I'm not sure how to take this forward as I don't understand the original problem at PR 16756 nor its solution. 😐

Perhaps @MichaelChirico could you see if commenting out those lines fixes the segfault you've been seeing?

MichaelChirico · 2024-09-25T06:24:39Z

~~Hmm, I tried pinning our GHA to R4.3 and still got a segfault:~~

~~Rdatatable/data.table@ab676f4~~

Scratch that, I guess it was a caching issue fixed by switching to the "latest" codecov GHA cribbed from dplyr:

Rdatatable/data.table#6540

Once I did that, I observed the same thing as @mcol -- the codecov GHA being extremely slow, but tolerable when switching to a pinned version of R 4.3.3.

utils::getParseData has a longstanding bug: for an installed package, parse data is available only for the last file [1]. To work around that, the get_tokens helper first calls getParseData and then falls back to custom logic that extracts the concatenated source lines, splits them on #line directives, and calls getParseData on each file's lines. The getParseData bug was fixed in R 4.4.0 (r84538). Unfortunately that change causes at least two issues (for some subset of packages): a substantial performance regression [2] and an error when applying exclusions [3]. Under R 4.4, getParseData always returns non-NULL as a result of that change when calculating package coverage (in other words, the get_parse_data fallback is _not_ triggered). The slowdown is partially due to the parse data no longer being cached across get_tokens calls. Another relevant aspect, for both the slowdown and the error applying exclusions, is likely that the new getParseData returns data for the entire package rather than the per-file parse data the downstream covr code expects. One solution would be to adapt covr's caching and handling of the getParseData when running under R 4.4.0 or later. Instead go with a simpler and more minimal fix. Reorder the calls so that the get_parse_data call, which we know has been the primary code path for package coverage before R 4.4.0, is the first call tried. Leave getParseData as the fallback to handle the non-package coverage cases. [1] r-lib#154 https://bugs.r-project.org/show_bug.cgi?id=16756 [2] As an extreme case, calling package_coverage on R.utils goes from under 15 minutes to over 6 hours. [3] nanotime (v0.3.10) and diffobj (v0.3.5) are two examples of packages that hit into this error. Closes r-lib#576 Closes r-lib#579 Re: r-lib#567

* split_on_line_directives: guard against input without a directive get_parse_data extracts lines from the input srcfile object and feeds them to split_on_line_directives, which expects the lines to be a concatenation of all the package R files, separated by #line directives. With how get_parse_data is currently called, that expectation is met. get_parse_data is called only if utils::getParseData returns NULL, and getParseData doesn't return NULL for any of the cases where the input does _not_ have line directives (i.e. entry points other than package_coverage). An upcoming commit is going to move the get_parse_data call in front of the getParseData call, so update split_on_line_directives to detect the "no directives" case. Without this guard, the mapply call in split_on_line_directives would error under an R version before 4.2; with R 4.2 or later, split_on_line_directives returns empty. * split_on_line_directives: fix handling of single-file package case split_on_line_directives breaks the input at #line directives and returns a named list of lines for each file. For a package with a single file under R/, there is one directive. The bounds calculation is still correct for that case. However, the return value is incorrectly a matrix rather than a list because the mapply call simplifies the result. At this point, this bug is mostly [*] unexposed because this code path is only triggered if utils::getParseData returns NULL, and it should always return a non-NULL result for the single-file package case. The next commit will reorder things, exposing the bug. Tell mapply to not simplify the result. [*] The simplification to a matrix could also happen for multi-file packages in the unlikely event that all files have the same number of lines. * parse_data: promote custom parse logic for R 4.4 compatibility utils::getParseData has a longstanding bug: for an installed package, parse data is available only for the last file [1]. To work around that, the get_tokens helper first calls getParseData and then falls back to custom logic that extracts the concatenated source lines, splits them on #line directives, and calls getParseData on each file's lines. The getParseData bug was fixed in R 4.4.0 (r84538). Unfortunately that change causes at least two issues (for some subset of packages): a substantial performance regression [2] and an error when applying exclusions [3]. Under R 4.4, getParseData always returns non-NULL as a result of that change when calculating package coverage (in other words, the get_parse_data fallback is _not_ triggered). The slowdown is partially due to the parse data no longer being cached across get_tokens calls. Another relevant aspect, for both the slowdown and the error applying exclusions, is likely that the new getParseData returns data for the entire package rather than the per-file parse data the downstream covr code expects. One solution would be to adapt covr's caching and handling of the getParseData when running under R 4.4.0 or later. Instead go with a simpler and more minimal fix. Reorder the calls so that the get_parse_data call, which we know has been the primary code path for package coverage before R 4.4.0, is the first call tried. Leave getParseData as the fallback to handle the non-package coverage cases. [1] #154 https://bugs.r-project.org/show_bug.cgi?id=16756 [2] As an extreme case, calling package_coverage on R.utils goes from under 15 minutes to over 6 hours. [3] nanotime (v0.3.10) and diffobj (v0.3.5) are two examples of packages that hit into this error. Closes #576 Closes #579 Re: #567

MichaelChirico mentioned this issue May 3, 2024

Disable covr (temporarily) Rdatatable/data.table#6122

Merged

MichaelChirico mentioned this issue Aug 29, 2024

Debuggin codecov issue Rdatatable/data.table#6117

Closed

This was referenced Sep 6, 2024

Error in exclude() 'from must be a finite number' for seq() usage #576

Closed

Re-enable covr Rdatatable/data.table#6479

Closed

MichaelChirico mentioned this issue Sep 25, 2024

Guess a fix to segfault issue #577

Closed

kyleam mentioned this issue Nov 4, 2024

Applying exclusions raise error due to missing srcref lines #579

Closed

kyleam mentioned this issue Nov 15, 2024

parse_data: Fix compatibility with R 4.4 #588

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

segfault recently in codecov() #567

segfault recently in codecov() #567

MichaelChirico commented May 1, 2024

mcol commented Aug 27, 2024

struckma commented Aug 28, 2024

mcol commented Sep 14, 2024

MichaelChirico commented Sep 25, 2024 •

edited

Loading

segfault recently in codecov() #567

segfault recently in codecov() #567

Comments

MichaelChirico commented May 1, 2024

mcol commented Aug 27, 2024

struckma commented Aug 28, 2024

mcol commented Sep 14, 2024

MichaelChirico commented Sep 25, 2024 • edited Loading

MichaelChirico commented Sep 25, 2024 •

edited

Loading