feat: Wasm SIMD implementation of `MatMatMulKer<f32>` #1420

ulan · 2024-05-27T16:17:44Z

To simplify code review, this PR implements operations only for a 4x4 matrix of f32. This implementation will be generalized to support more operations and types in follow-up PRs.

Issue: #1361

Changes

Disable the default features of thecriterion and proptest crates because they are not compatible with Wasm. This allows running cargo test to test Wasm changes.
Add a new wasm module gated by target_family = "wasm" and target_feature = "simd128".
Implement WasmMmm4x4 using std::arch::wasm32 intrinsics.
Add test_mmm_kernel_f32! tests for the new implementation.
Add prop tests that compare the new implementation against the existing generic implementation to gain additional test coverage.

Benchmarking

The onnx-mobilenet-v2 example was used to benchmark the new implementation.

The example was modified to use a PNG image instead of a JPG image because jpeg-decoder returns different image bytes depending on whether Wasm SIMD is enabled or not. See: https://github.com/image-rs/jpeg-decoder/blob/c1a1fe04cc54a5446e57a71ea856afd07cd374b2/src/arch/wasm.rs#L9
The example was also modified to measure only the duration of inference (model.run()).
See the first commit in this PR for these changes that were reverted.

Results of running with wasmtime -O opt-level=0:

Baseline: 624ms
Wasm SIMD with Rust auto-vectorization: 523ms (1.2x faster)
Wasm SIMD with this implementation: 235ms (2.6x faster)

Results of running with wasmtime:

Baseline: 338ms
Wasm SIMD with Rust auto-vectorization: 321ms (1.05x faster)
Wasm SIMD with this implementation: 206ms (1.6x faster)

ulan · 2024-05-27T16:19:06Z

@kali: could you please take an initial look? I am happy to revert changes in Cargo.toml or remove the new prop tests if you prefer that.

ulan · 2024-05-27T16:24:19Z

Here is the documentation of Wasm SIMD intrinsics: https://doc.rust-lang.org/core/arch/wasm32/index.html
They support 128-bit operations.

kali

That's a very big first step. Thanks a lot for going through with this.

kali · 2024-05-27T16:35:56Z

Cargo.toml

@@ -103,7 +103,7 @@ num-integer = "0.1.44"
 num-traits = "0.2.14"
 openblas-src = { version = "0.10", features = ["static"] }
 paste = "1.0.5"
-proptest = "1.0.0"
+proptest = { version = "1.0.0", default-features = false, features = ["std"] }


what feature do we loose here ? is that gonna be a problem ?

I pushed a new change to enable some of the default features that are compatible with Wasm.

After that change, for criterion we will lose only rayon:
https://github.com/bheisler/criterion.rs/blob/f1ea31a92ff919a455f36b13c9a45fd74559d0fe/Cargo.toml#L74C12-L74C54

For proptest we will lose fork and timeout:
https://github.com/proptest-rs/proptest/blob/a62a348b59f422161cbc5c6910f83f1b3c3e67e5/proptest/Cargo.toml#L22

These are multi-threading features that affect performance.

If that's unacceptable, I can revert all Cargo.toml changes and keep the current status quo of not running Wasm tests. I think that would be okay because we can require people to run those tests locally when making Wasm-related changes.

We really need to keep proptest fork around (that's super useful to test kernel under address sanitizing, as the sanitizer will abort the process at the first issue). Can we achieve keeping proptest whole on every non-wasm platform by some cfg() tricks in Cargo.toml ?

I was also thinking in that direction initially. I couldn't find a way to have a platform-specific dependency in the workspace: rust-lang/cargo#5220

I could explore moving proptest into a create-local dependency (instead of workspace) and then it should be possible to have platform-specific rules.

Worth a try. If it solves the problem, I'm ok with proptest becoming a crate-level dep.

This should be fixed in the latest commit.

kali · 2024-05-27T16:39:52Z

linalg/src/wasm/tests.rs

+    ((r1, actual), (r2, expected))
+}
+
+proptest! {


what's the logic to add these extra tests ? Are they covering more than the regular test_mmm_kernel_f32 ? I'm not asking for their removal, just trying to understand if there is something missing in the regular standard kernel test suite.

Yes, that was exactly the motivation for adding these tests. I wanted to get some confidence that I didn't introduce bugs. I noticed that test_mmm_kernel_f32 was not covering everything. For example, in leaky relu removing the last line still passes test_mmm_kernel_f32 (IIRC).

Can we make the generic test cases cover these instead ? That would benefit all the kernels, including the future wasm ones that will use a different geometry than 4x4.

I'll try to add a test that covers the missing leaky relu case that I know of. I don't know how to find all missing cases though.

Sure, the generic kernel test suite is a work in progress, it does not cover corner case and will never do. But it is very beneficial to augment it instead of adding tests that are specialized for a given kernel. Now that you have setup the basic building blocks for wasm kernels, I expect we will pretty soon have a half a dozen or more wasm kernels, so believe me, being able to rely on the generic kernel test suite will make a hug lot of sense...

I fixed the existing leaky relu test to multiply by 1 instead of 0 (which was causing the test to be ineffective because multiplying by 0 always gives 0). I also removed the custom Wasm tests.

kali · 2024-05-27T16:42:06Z

What should we add to the CI to get the new tests to run?

ulan · 2024-05-27T19:06:16Z

What should we add to the CI to get the new tests to run?

Good question! I think we would need to install wasmtime and run in tract/linalg:

RUSTFLAGS='-C target-feature=+simd128' CARGO_TARGET_WASM32_WASI_RUNNER=wasmtime cargo test --target=wasm32-wasi

I am not good with GitHub CI, but I can take a look to see if it is possible to add such as a step.

kali · 2024-05-28T06:37:45Z

What should we add to the CI to get the new tests to run?

Good question! I think we would need to install wasmtime and run in tract/linalg:
RUSTFLAGS='-C target-feature=+simd128' CARGO_TARGET_WASM32_WASI_RUNNER=wasmtime cargo test --target=wasm32-wasi
I am not good with GitHub CI, but I can take a look to see if it is possible to add such as a step.

I can try and do that. Working on the CI from a fork is cumbersome.

kali · 2024-05-28T07:28:20Z

So the tests are in place, but as you can see, we have an issue here. (Ignore the problem on nightly, this is a cargo bug that is being fixed).

https://github.com/sonos/tract/actions/runs/9264996443/job/25486167296?pr=1420

I checked, and the issue was there without the kernel, so I'm not sure how to proceed, it looks like we're getting into an unreachable!() in wasmtime. Any chance you can help ? (If this is non-trivial, we could move this investigation to a separate PR, fix the tests and rebase this PR.)

ulan · 2024-05-28T15:49:14Z

Thanks a lot for adding the CI jobs! Let me try to reproduce the failure locally.

ulan · 2024-05-28T16:12:01Z

The tests should pass now. The wasmtime unreachable was due to a panic in rust code. (I saw that after running with -- --nocapture).

ulan · 2024-06-02T12:37:59Z

@kali: please take another look when you have time. The PR is ready to go from my side.

kali · 2024-06-02T16:50:03Z

It does look pretty good. Thanks a lot for contributing this!

ulan added 2 commits May 27, 2024 15:38

feat: Wasm SIMD implementation of MatMatMulKer<f32>

2651a27

Undo onnx demo changes

05f63f7

ulan changed the title ~~Draft: feat: Wasm SIMD implementation of MatMatMulKer<f32>~~ feat: Wasm SIMD implementation of MatMatMulKer<f32> May 27, 2024

kali reviewed May 27, 2024

View reviewed changes

Enable Wasm-compatible default features of criterion and proptest

4e9d623

kali added 5 commits May 28, 2024 08:48

setup some tests for wasm

1fb428c

wasm ci test, take 2

b39983e

path to wasmtime

5634b48

path to wasmtime, again

5d12443

path to wasmtime, again, again

ba5690b

Fix tests that are failing on Wasm

48a4e03

ulan added 3 commits June 2, 2024 12:14

Use Wasm-specific criterion/proptest when needed

b594433

Improve LeakyRelu test

080b6e3

Remove custom Wasm tests

db14e6e

kali merged commit 2a2914a into sonos:main Jun 2, 2024
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Wasm SIMD implementation of `MatMatMulKer<f32>` #1420

feat: Wasm SIMD implementation of `MatMatMulKer<f32>` #1420

ulan commented May 27, 2024

ulan commented May 27, 2024

ulan commented May 27, 2024

kali left a comment

kali May 27, 2024

ulan May 27, 2024

kali May 28, 2024

ulan May 28, 2024

kali May 29, 2024 •

edited

Loading

ulan Jun 2, 2024

kali May 27, 2024

ulan May 27, 2024

kali May 28, 2024

ulan May 28, 2024

kali May 29, 2024

ulan Jun 2, 2024

kali commented May 27, 2024

ulan commented May 27, 2024

kali commented May 28, 2024

kali commented May 28, 2024

ulan commented May 28, 2024

ulan commented May 28, 2024

ulan commented Jun 2, 2024

kali commented Jun 2, 2024

feat: Wasm SIMD implementation of MatMatMulKer<f32> #1420

feat: Wasm SIMD implementation of MatMatMulKer<f32> #1420

Conversation

ulan commented May 27, 2024

Changes

Benchmarking

ulan commented May 27, 2024

ulan commented May 27, 2024

kali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kali May 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kali commented May 27, 2024

ulan commented May 27, 2024

kali commented May 28, 2024

kali commented May 28, 2024

ulan commented May 28, 2024

ulan commented May 28, 2024

ulan commented Jun 2, 2024

kali commented Jun 2, 2024

feat: Wasm SIMD implementation of `MatMatMulKer<f32>` #1420

feat: Wasm SIMD implementation of `MatMatMulKer<f32>` #1420

kali May 29, 2024 •

edited

Loading