Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec update 20241105 #1210

Merged
merged 52 commits into from
Nov 6, 2024
Merged

Spec update 20241105 #1210

merged 52 commits into from
Nov 6, 2024

Conversation

headius
Copy link
Contributor

@headius headius commented Nov 6, 2024

No description provided.

jeremyevans and others added 30 commits November 5, 2024 11:29
Previously, a TypeError was not raised if there were no thread
variables, because the conversion to symbol was done after that
check.  Convert to symbol before checking for whether thread
variables are set to make the behavior consistent.

Fixes [Bug #20606]
[Feature #19236]

When building a large hash, pre-allocating it with enough
capacity can save many re-hashes and significantly improve
performance.

```
/opt/rubies/3.3.0/bin/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \
	            --executables="compare-ruby::../miniruby-master -I.ext/common --disable-gem" \
	            --executables="built-ruby::./miniruby --disable-gem" \
	            --output=markdown --output-compare -v $(find ./benchmark -maxdepth 1 -name 'hash_new' -o -name '*hash_new*.yml' -o -name '*hash_new*.rb' | sort)
compare-ruby: ruby 3.4.0dev (2024-03-25T11:48:11Z master f53209f023) +YJIT dev [arm64-darwin23]
last_commit=[ruby/irb] Cache RDoc::RI::Driver.new (ruby/irb#911)
built-ruby: ruby 3.4.0dev (2024-03-25T15:29:40Z hash-new-rb 77652b08a2) +YJIT dev [arm64-darwin23]
warming up...

|                    |compare-ruby|built-ruby|
|:-------------------|-----------:|---------:|
|new                 |      7.614M|    5.976M|
|                    |       1.27x|         -|
|new_with_capa_1k    |     13.931k|   15.698k|
|                    |           -|     1.13x|
|new_with_capa_100k  |     124.746|   148.283|
|                    |           -|     1.19x|
```
[Bug #20593]

It's fairly common to use `format` to interpolate a number of values
into a user provided strings.

The arguments not matching are a problem when they are positional,
but when they are named, it's absolutely fine and we shouldn't
emit a warning.
* "Allow ambiguosity of `return` line"
  65b991bc8571b7b718fc22bd33a43c4d269bf52d
* "Move to test/.excludes-prism"
  3b4ff810d2fefdf0194bd774bc04f6f17e2ccae7
* "Pending `EVENT_RETURN` settracefunc tests with Prism"
  a7f33c99c69e3cc62b7a24ce35f51f76cc5bfaa2
…r#size

What a "word" is when talking about sizes is confusing because it's a
highly overloaded term. Intel, Microsoft, and GDB are just a few vendors
that have their own definition of what a "word" is. Specs that used the
"wordsize" guard actually were mostly testing for the size of the C
`long` fundamental type, so rename the guard for clarity.

Also, get the size of `long` directly from RbConfig instead of guessing
using Integer#size. Integer#size is not guaranteed to have anything to
do with the `long` type.
There is no guarantee that Integer#size will continue to return
`sizeof(long)` for small integers.

Use the `l!` specifier for Array#pack instead. It is a public
interface that has a direct relationship with the `long` type.
The spec is actually testing a behaviour stemming from NUM2INT(), and
since `sizeof(long)>=sizeof(int)`, `min_long-1` always makes NUM2INT()
raise `RangeError`.
Make Range#step to consistently use + for iteration [Feature #18368]

Previously, non-numerics expected step to be integer,
and iterated with begin#succ, skipping over step value
steps. Since this commit, numeric and non-numeric iteration
behaves the same way, by using + operator.
[Feature #20707]

Converting Time into RFC3339 / ISO8601 representation is an significant
hotspot for applications that serialize data in JSON, XML or other formats.

By moving it into core we can optimize it much further than what `strftime` will
allow.

```
compare-ruby: ruby 3.4.0dev (2024-08-29T13:11:40Z master 6b08a50a62) +YJIT [arm64-darwin23]
built-ruby: ruby 3.4.0dev (2024-08-30T13:17:32Z native-xmlschema 34041ff71f) +YJIT [arm64-darwin23]
warming up......

|                        |compare-ruby|built-ruby|
|:-----------------------|-----------:|---------:|
|time.xmlschema          |      1.087M|    5.190M|
|                        |           -|     4.78x|
|utc_time.xmlschema      |      1.464M|    6.848M|
|                        |           -|     4.68x|
|time.xmlschema(6)       |    859.960k|    4.646M|
|                        |           -|     5.40x|
|utc_time.xmlschema(6)   |      1.080M|    5.917M|
|                        |           -|     5.48x|
|time.xmlschema(9)       |    893.909k|    4.668M|
|                        |           -|     5.22x|
|utc_time.xmlschema(9)   |      1.056M|    5.707M|
|                        |           -|     5.40x|
```
[Feature #20702]

Works the same way than `Hash#fetch_values` for for array.
[Feature #20594]

A handy method to construct a string out of multiple chunks.

Contrary to `String#concat`, it doesn't do any encoding negociation,
and simply append the content as bytes regardless of whether this
result in a broken string or not.

It's the caller responsibility to check for `String#valid_encoding?`
in cases where it's needed.

When passed integers, only the lower byte is considered, like in
`String#setbyte`.
This is a static function only called in two places (rb_to_id and
rb_to_symbol), and in both places, both symbols and strings are
allowed.  This makes the error message consistent with rb_check_id
and rb_check_symbol.

Fixes [Bug #20607]
The absence of either the integer or fractional part should be
allowed.
[Feature #20205]

The warning now suggests running with --debug-frozen-string-literal:

```
test.rb:3: warning: literal string will be frozen in the future (run with --debug-frozen-string-literal for more information)
```

When using --debug-frozen-string-literal, the location where the string
was created is shown:

```
test.rb:3: warning: literal string will be frozen in the future
test.rb:1: info: the string was created here
```

When resurrecting strings and debug mode is not enabled, the overhead is a simple FL_TEST_RAW.
When mutating chilled strings and deprecation warnings are not enabled,
the overhead is a simple warning category enabled check.

Co-authored-by: Jean Boussier <[email protected]>
Co-authored-by: Nobuyoshi Nakada <[email protected]>
Co-authored-by: Jean Boussier <[email protected]>
[Bug #20803]

`abc` is used a lot across the ruby spec suite, if another test runs before
this spec is loaded and create this symbol dynamically (`"abc".to_sym`)
the spec will fail.

So it's preferable to use a symbol name that is very unlikely to be
used elsewhere to avoid flakes.
* YJIT: Replace Array#each only when YJIT is enabled

* Add comments about BUILTIN_ATTR_C_TRACE

* Make Ruby Array#each available with --yjit as well

* Fix all paths that expect a C location

* Use method_basic_definition_p to detect patches

* Copy a comment about C_TRACE flag to compilers

* Rephrase a comment about add_yjit_hook

* Give METHOD_ENTRY_BASIC flag to Array#each

* Add --yjit-c-builtin option

* Allow inconsistent source_location in test-spec

* Refactor a check of BUILTIN_ATTR_C_TRACE

* Set METHOD_ENTRY_BASIC without touching vm->running
* It was calculating remaining time once up-front, properly decrementing remaining nanos for the loop, but continuing to sleep based on the original calculation

* Now it calculates the remainder on each iteration, based on the current decremented nanoseconds

* Also moves the spec to a more appropriate shared ruby spec location
@andrykonchin
Copy link
Member

Fixed issues with core/encoding/compatible_spec.rb file.

@headius
Copy link
Contributor Author

headius commented Nov 6, 2024

@eregon:

anything relying on sleep is extremely brittle and will likely cause transients

It doesn't appear to be transient, though. It consistently passes on Windows and Linux and consistently fails on MacOS.

And it's hard to spec sleep without relying on sleep.

seems unrelated to Thread#wakeup and seems about Kernel#sleep instead

Agreed, I will move it.

If it passes reliably on Linux you could guard it

I will do this for now and continue investigating the CI MacOS failures separately.

So far the person syncing the specs also fixed the failures

I think it would be better going forward if the person that introduced a breaking change corrects that change.

@andrykonchin Thank you for correcting those issues!

This is really a sleep spec, so it should be tested under the
Kernel#sleep specs.

Fails consistently on MacOS (in CI) and passes consistently on
Linux and Windows so omit MacOS for now.
@headius headius merged commit a230079 into ruby:master Nov 6, 2024
14 checks passed
@headius headius deleted the spec_update_20241105 branch November 6, 2024 17:53
@eregon
Copy link
Member

eregon commented Nov 7, 2024

It doesn't appear to be transient, though. It consistently passes on Windows and Linux and consistently fails on MacOS.

As I suspected, it fails in TruffleRuby's CI on Linux, both on CRuby and TruffleRuby.
I think we should just remove the spec if it's not made more stable (e.g. with the idea in #1210 (comment)).
I'll quarantine it now in truffleruby's copy.

@eregon
Copy link
Member

eregon commented Nov 7, 2024

And it also has caused failures in CRuby: ruby/ruby@7c893c8
So, yeah, any transient spec relying on sleep timing will be removed or quarantined, it's almost guaranteed to fail in my long-term experience with those.
If it fails on one platform due to timing it's very likely to fail on other platforms too.

graalvmbot pushed a commit to oracle/truffleruby that referenced this pull request Nov 7, 2024
matzbot pushed a commit to ruby/ruby that referenced this pull request Nov 7, 2024
@headius
Copy link
Contributor Author

headius commented Nov 7, 2024

Have you considered the possibility that it might actually just be broken in CRuby and TruffleRuby? It has been passing in JRuby's CI consistently for several weeks.

Regardless, it is behavior that was important enough for @ioquatix to file issues with both TruffleRuby and JRuby. Perhaps he can weigh in on how to verify the behavior in a more reliable way.

@headius
Copy link
Contributor Author

headius commented Nov 7, 2024

I should also point out that in the pr where I am investigating the failure, CRuby isn't just off a little bit. It's off by an order of magnitude (8-35x off from expected). Something is causing a 0.1ms sleep to take way longer on Darwin. This is not just an inconvenience for testing. I think it exposes a real problem.

@eregon
Copy link
Member

eregon commented Nov 7, 2024

man nanosleep on Linux makes it pretty clear there are no guarantees (e.g. nanosleep() suspends the execution of the calling thread until either at least the time specified in *req has elapsed, or ...) and on a busy system sleep calls take longer than what's asked.
Also I guess JRuby uses java.lang.Thread#sleep(long, int) like TruffleRuby does.

Anyway, feel free to try to improve this spec or check CRuby's sleep impl.
The issue is adding a transiently failing spec is causing havoc everywhere.

@headius
Copy link
Contributor Author

headius commented Nov 7, 2024

And I'm saying it wasn't transiently failing when I merged it over. It was consistently passing on Linux and Windows and consistently failing on Darwin by a wide margin. I went with your suggestion to leave it enabled only on the passing platforms and opened a draft PR to investigate further.

I don't care if it is quarantined for now but I believe it is wrong to dismiss the failures as being due solely to unreliable sleep times. Being off by an order of magnitude is not just unreliable, it's broken.

@eregon eregon mentioned this pull request Nov 7, 2024
@ioquatix
Copy link
Member

ioquatix commented Nov 7, 2024

IIRC, the issue was sleep(x) for some values of x < 1ms (again, IIRC), was the same as sleep(0) on JRuby and TruffleRuby due to the internal implementation truncating values. If used in a loop, this behaviour can cause the loop to execute far more rapidly than expected. I think the expectation is that sleep(x) sleeps for >= a wall clock time of x.

If your sleep(x) is sleeping for less than x, or consistently more than 10x, I think the implementation may have issues. It's true that a over-subscribed system may put a chunky factor on sleep(x), so it's not good to rely in sleep(x) <= fx where f is some factor like 2 or 3. In practice, as has been discussed and shown by tests time and time again, it's extremely likely to be problematic (i.e. "flaky tests")

      (actual_duration > 0.01).should == true # 100 * 0.0001 => 0.01
      (actual_duration < 1).should == true

The first assertion seems totally reasonable to me, although technically it could be >= if we are pedantic.

The second assertion is frustratingly problematic. On an oversubscribed system, sleep could be out by a large factor 10-100x, especially on GitHub runners, and especially the macOS runners, which seem to be far more over-subscribed in practice, leading to a lot of scheduling issues.

The only reasonable way I've found to deal with this is to try and detect how over-subscribed the system is and try to compensate for it, when dealing with these kinds of tests. The other option is to simply remove assertions on upper bounds. Trying to add a multiplier of 100x is still bound to be flaky on some edge cases (bell curve etc).

As a counter point, on a normal under-subscribed system (not overloaded), sleep(x) should be within some reasonable factor of x, maybe at most 10%. It's also true, there are sometimes bugs in the implementation that cause that factor to be large (invalid scheduling, problematic implementation, etc). Therefore, this kind of test will catch those kinds of problems. So I do believe that there is value in such tests, just that it's extremely hard to express without being flaky.

Bear in mind that adding a 100x factor may also hide bugs... In other words, the assertion, when run on an under-subscribed system was better for detecting issues. It's an extremely tricky issue. Maybe we need to differentiate between "hard failure" (assertion) and some kind of informational "something looks wrong".

(Sorry, I'm not sure this really helps much, I've been here multiple times with async's scheduler and I understand the pain and frustration).

@headius
Copy link
Contributor Author

headius commented Nov 7, 2024

@ioquatix thank you for the input! I agree that the excessively long sleep times on MRI seem like they're a problem to be investigated. For now, I have simply removed the < 0.03 check since the purpose of this spec and fix was to check that sleeps were long enough, not whether they are too long.

I won't have time to look further into why MRI failed the 0.03 check but I think somebody should.

@eregon
Copy link
Member

eregon commented Nov 7, 2024

@ioquatix that makes a lot of sense and matches my experiences as well.
@headius removed the upper bound in https://github.com/ruby/spec/pull/1212/files so that should no longer fail transiently based on how oversubscribed is the system.


@headius Did you get some conflicts when rebasing specs? Anything non-trivial?
Sometimes it can be not so easy to resolve and typically one needs to look at the changes to that file on 2 branches to see what each commit meant to do.

@ioquatix
Copy link
Member

ioquatix commented Nov 7, 2024

the excessively long sleep times on MRI seem like they're a problem to be investigated

Is this when running locally, e.g. under-subscribed system, or when running in CI, where we can expect it to be "problematic"?

@eregon
Copy link
Member

eregon commented Nov 7, 2024

FWIW in oracle/truffleruby#2716 (comment) I made some measurements.
It looks like sub-millisecond sleeps are very imprecise, on all Ruby implementations, most likely because the OS doesn't give much guarantees there.

@headius
Copy link
Contributor Author

headius commented Nov 7, 2024

Did you get some conflicts when rebasing specs? Anything non-trivial?

@eregon There were a few small conflicts, I think both TruffleRuby and CRuby added the same specs. Nothing that was difficult to resolve.

Is this when running locally, e.g. under-subscribed system, or when running in CI, where we can expect it to be "problematic"?

@ioquatix Locally for me on MacOS and in CI on Linux and Windows it passed consistently. CI on MacOS consistently failed, with 100 sleeps of 0.1ms taking anywhere from 80ms to 350ms.

Apparently TruffleRuby also had issues.

JRuby passes it consistently in CI on all platforms we test (mostly Linux).

@headius
Copy link
Contributor Author

headius commented Nov 7, 2024

It looks like sub-millisecond sleeps are very imprecise

Perhaps I'm looking at it wrong, but it seems like only TruffleRuby is having difficulties there. JRuby, now fixed, always has a quantum under in the range of 0.003 to 0.005 on my M1 Mac.

It's also hard to say it's the sleep that's imprecise in your testing. If it warms up does it stabilize for TR?

CRuby is always around 0.003 except when it GCs (looping over the example 10k times).

@eregon
Copy link
Member

eregon commented Nov 8, 2024

It's the same for all Rubies, in theory they should all print 0.10... but they don't (this is on a pretty idle Linux desktop):

$ ruby -v -rbenchmark -e '3.times { p Benchmark.realtime { 1000.times { sleep 0.0001 } } }'
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]
0.15385151699956623
0.1538250559988228
0.1538307459995849

$ ruby -v -rbenchmark -e '3.times { p Benchmark.realtime { 1000.times { sleep 0.0001 } } }'
jruby 9.4.9.0-SNAPSHOT (3.1.4) 2024-11-04 746b84aa02 OpenJDK 64-Bit Server VM 17.0.12+7 on 17.0.12+7 +jit [x86_64-linux]
0.16409351599941147
0.15885490400069102
0.15931887199985795

$ ruby -v -rbenchmark -e '3.times { p Benchmark.realtime { 1000.times { sleep 0.0001 } } }'
truffleruby 24.1.1, like ruby 3.2.4, Oracle GraalVM Native [x86_64-linux]
0.16984848700121802
0.16197076400021615
0.16043854000054125

$ ruby -v -rbenchmark -e '3.times { p Benchmark.realtime { 1000.times { sleep 0.0001 } } }'
truffleruby 24.1.1, like ruby 3.2.4, Oracle GraalVM JVM [x86_64-linux]
0.16442835199995898
0.16147058500064304
0.15863075099878188

@eregon
Copy link
Member

eregon commented Nov 8, 2024

Using strace -T (+ --follow-forks on JRuby), they all do:

futex(0x822210, FUTEX_WAKE_PRIVATE, 1)  = 0 <0.000016>
futex(0x823260, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=11789, tv_nsec=488181439}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out) <0.000148>
futex(0x822210, FUTEX_WAKE_PRIVATE, 1)  = 0 <0.000015>
futex(0x823260, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=11789, tv_nsec=488404312}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out) <0.000148>
futex(0x822210, FUTEX_WAKE_PRIVATE, 1)  = 0 <0.000015>
futex(0x823260, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=11789, tv_nsec=488626766}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out) <0.000149>
futex(0x822210, FUTEX_WAKE_PRIVATE, 1)  = 0 <0.000016>
futex(0x823260, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=11789, tv_nsec=488848929}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out) <0.000149>

Notice how each sleep is too long by about 50% / 0.00005s.

@eregon
Copy link
Member

eregon commented Nov 8, 2024

My take on this is we are not using real-time operating systems, so sleep() is at least as much as passed, with pretty much no guaranteed upper bound under load or when the sleep amount is small (e.g. below 1ms).

@ioquatix
Copy link
Member

ioquatix commented Nov 8, 2024

Nice analysis! It's great to see the original problem I had (sleep(small x) = no op) gone and all platforms behaving very similarly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.