Simplify implementation of nn.relu. #25331

carlosgmartin · 2024-12-08T01:57:24Z

No description provided.

jakevdp · 2024-12-09T03:00:18Z

This is one of those changes that's probably OK, but has a small chance of causing some strange unintended numerical regression deep within models that use RELU. This simplification saves one line of code: I'm inclined to fall back on the "if it's not broken don't fix it" principle. What do you think?

carlosgmartin · 2024-12-09T16:24:24Z

@jakevdp I'd consider that 4 lines of code. 🙂 The test should be considered separately, since it should've existed in the first place. And I think tests in general should be treated as a separate category for the purposes of codebase length, since it's not part of JAX in and of itself, but rather a test of it, and we typically want extensive test coverage.

I think we should always take the opportunity to simplify JAX's codebase, unless there's a very strong reason not to. Otherwise we'd be stuck with suboptimal choices forever. Many small improvements add up. That eases the future maintenance burden.

In this particular case, there's no reason why we should have to bring in the custom_jvp machinery to define ReLU.

If any issues do pop up, I'll handle them.

jakevdp · 2024-12-09T18:21:15Z

I would approve a PR adding that test without any comment 😁

Note however, that your change does modify numerical results. For example, try grad(relu)(nan) with the old and new definitions. Will that make a difference downstream? Hard to say.

carlosgmartin · 2024-12-09T21:57:27Z

@jakevdp If you're taking the grad of relu at nan, you've got bigger problems. 😁

jakevdp · 2024-12-09T22:48:26Z

I'm truly not trying to be difficult here, it's just that many times I've seen changes like this cause headaches down the road, and saving one line of code doesn't strike me as worth the risk.

carlosgmartin · 2024-12-10T00:43:50Z

IMO we can always undo it if it does cause issues (though I don't see how it could).

But if you'd rather not, that's fine too.

carlosgmartin · 2024-12-11T00:22:20Z

@jakevdp Just in case this addresses your concern, I edited it so that it now produces the same grad at nan.

I also created a separate PR to isolate the other changes. If that one is merged, I'll rebase this one.

jakevdp · 2024-12-11T16:59:08Z

Thanks for splitting the uncontroversial changes into another PR! Let's go ahead and try this: can you rebase on the current main branch? Thanks!

carlosgmartin · 2024-12-11T18:46:50Z

@jakevdp Done.

jakevdp

Thanks!

jakevdp · 2024-12-11T19:29:01Z

We're seeing some failures on ShardingInTypesTest.test_scan ~~when run on TPU backends:~~ on all backends:

TypeError: select cases must have the same shardings, got [NamedSharding(mesh=Mesh('x': 2, 'y': 2), spec=PartitionSpec(None, None)), NamedSharding(mesh=Mesh('x': 2, 'y': 2), spec=PartitionSpec(None, 'y'))].

The test is here:

jax/tests/pjit_test.py

Line 4640 in 5fe8bcc

class ShardingInTypesTest(jtu.JaxTestCase):

I'm not sure how to best address this.

jakevdp · 2024-12-11T19:31:03Z

Maybe use lax.full_like(x, 0) instead of 0 for the last argument of where?

jakevdp · 2024-12-11T19:36:06Z

This change is also breaking one flax test, on this line: https://github.com/google/flax/blob/554b690bab07920860acbdb1d4fae03cc516d385/tests/linen/summary_test.py#L676

AssertionError: '1628954' not found in '│        │ Classifier │ float32[1,28,28,1] │ float32[1,10]   │ 1629979 │ 5698101   │                                         │'

carlosgmartin · 2024-12-11T19:36:14Z

@jakevdp I'll try that. But does this expose a broader issue or lack of optimization in the compiler or sharding machinery?

jakevdp · 2024-12-11T19:37:37Z

@jakevdp I'll try that. But does this expose a broader issue or lack of optimization in the compiler or sharding machinery?

I think it's due to the fact that the broadcasting machinery in jax.numpy does not have any logic around sharding.

carlosgmartin · 2024-12-11T19:43:57Z

Updated.

I think it's due to the fact that the broadcasting machinery in jax.numpy does not have any logic around sharding.

Is this considered a current shortcoming? If so, should we open an issue about it (and link back to here)?

jakevdp · 2024-12-11T20:07:03Z

There are some other failures as well that I'm not sure how to deal with: for example, an internal rematerialization test that's looking for a particular pattern in the emitted StableHLO for some complicated model.

The flax test is concerning as well, as it points to the fact that the compiler is generating a quantitatively different program with your PR than with the existing implementation. Given the importance of relu to production models, we'd probably want some comprehensive benchmarks of realistic models to convince ourselves that this change is going in the right direction.

Honestly, I don't think it's worth the effort to track down these fixes and do the analysis necessary to land this change.

yashk2810 · 2024-12-11T22:15:23Z

#25423 should make jnp.where(x > 0, x, 0) work properly with shardings.

carlosgmartin force-pushed the simplify_nn_relu branch from 162f364 to b41e069 Compare December 8, 2024 03:08

carlosgmartin force-pushed the simplify_nn_relu branch from b41e069 to 107911e Compare December 11, 2024 00:08

carlosgmartin mentioned this pull request Dec 11, 2024

Add test of relu grad at zero. Update paper links. #25392

Merged

carlosgmartin force-pushed the simplify_nn_relu branch from 107911e to 4eb7ce2 Compare December 11, 2024 18:29

jakevdp approved these changes Dec 11, 2024

View reviewed changes

google-ml-butler bot added kokoro:force-run pull ready Ready for copybara import and testing labels Dec 11, 2024

kokoro-team removed the kokoro:force-run label Dec 11, 2024

Simplify implementation of nn.relu.

879a599

carlosgmartin force-pushed the simplify_nn_relu branch from 4eb7ce2 to 879a599 Compare December 11, 2024 19:42

jakevdp approved these changes Dec 11, 2024

View reviewed changes

google-ml-butler bot added the kokoro:force-run label Dec 11, 2024

kokoro-team removed the kokoro:force-run label Dec 11, 2024

Testing.

777e577

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify implementation of nn.relu. #25331

Simplify implementation of nn.relu. #25331

carlosgmartin commented Dec 8, 2024

jakevdp commented Dec 9, 2024

carlosgmartin commented Dec 9, 2024 •

edited

Loading

jakevdp commented Dec 9, 2024 •

edited

Loading

carlosgmartin commented Dec 9, 2024

jakevdp commented Dec 9, 2024

carlosgmartin commented Dec 10, 2024

carlosgmartin commented Dec 11, 2024 •

edited

Loading

jakevdp commented Dec 11, 2024

carlosgmartin commented Dec 11, 2024

jakevdp left a comment

jakevdp commented Dec 11, 2024 •

edited

Loading

jakevdp commented Dec 11, 2024 •

edited

Loading

jakevdp commented Dec 11, 2024

carlosgmartin commented Dec 11, 2024

jakevdp commented Dec 11, 2024

carlosgmartin commented Dec 11, 2024

jakevdp commented Dec 11, 2024 •

edited

Loading

yashk2810 commented Dec 11, 2024 •

edited

Loading

Simplify implementation of nn.relu. #25331

Are you sure you want to change the base?

Simplify implementation of nn.relu. #25331

Conversation

carlosgmartin commented Dec 8, 2024

jakevdp commented Dec 9, 2024

carlosgmartin commented Dec 9, 2024 • edited Loading

jakevdp commented Dec 9, 2024 • edited Loading

carlosgmartin commented Dec 9, 2024

jakevdp commented Dec 9, 2024

carlosgmartin commented Dec 10, 2024

carlosgmartin commented Dec 11, 2024 • edited Loading

jakevdp commented Dec 11, 2024

carlosgmartin commented Dec 11, 2024

jakevdp left a comment

Choose a reason for hiding this comment

jakevdp commented Dec 11, 2024 • edited Loading

jakevdp commented Dec 11, 2024 • edited Loading

jakevdp commented Dec 11, 2024

carlosgmartin commented Dec 11, 2024

jakevdp commented Dec 11, 2024

carlosgmartin commented Dec 11, 2024

jakevdp commented Dec 11, 2024 • edited Loading

yashk2810 commented Dec 11, 2024 • edited Loading

carlosgmartin commented Dec 9, 2024 •

edited

Loading

jakevdp commented Dec 9, 2024 •

edited

Loading

carlosgmartin commented Dec 11, 2024 •

edited

Loading

jakevdp commented Dec 11, 2024 •

edited

Loading

jakevdp commented Dec 11, 2024 •

edited

Loading

jakevdp commented Dec 11, 2024 •

edited

Loading

yashk2810 commented Dec 11, 2024 •

edited

Loading