Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish SIMD notes #1721

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions simd/2024/SIMD-04-12.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,13 @@ Logistics: is this still a good time slot for everybody? Maybe we should file an

### Attendees

Anton Kirilov
Deepti Gandluri
Ilya Rezvov
Petr Penzin
Shravan Narayan
Thomas Lively
Yury Delendik
- Anton Kirilov
- Deepti Gandluri
- Ilya Rezvov
- Petr Penzin
- Shravan Narayan
- Thomas Lively
- Yury Delendik

### Update and discussion on fp16 support

Expand Down
1 change: 1 addition & 0 deletions simd/2024/SIMD-06-21.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ This meeting will be a Google Meet video conference.
- Yury Delendik

### Update and discussion on FP16

AB: I am curious about instruction lowering and when hardware support for it would be available across the board

IR: (Shares link to [Lowering](https://github.com/WebAssembly/half-precision/blob/main/proposals/half-precision/Lowering.md))
Expand Down
52 changes: 51 additions & 1 deletion simd/2024/SIMD-07-19.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,5 +32,55 @@ This meeting will be a Google Meet video conference.

## Meeting notes

TBD
### Attendees

- Anton Kirilov
- Petr Penzin
- Yury Delendik

### Horizontal operations

https://github.com/WebAssembly/flexible-vectors/issues/65

AK: The goal is to pattern match some vector additions and some shuffles.

PP: was hoping to talk about this before LLVM patch gets merged … What is the pattern?

AK: AFAIK the patch produces horizontal operations as a series of addi

PP: yes, there seems to be fp32

AK: this should help other runtiems

PP: we should document that, given this is a bit more hardware oriented patch

AK: there was a patch for integer splats vs integer … Were horizontal ops discussed in SIMD proposal before?

PP: https://github.com/WebAssembly/simd/issues/20 There are some concerns for performance

AK: Horizontal ops are also slower on Arm, but still useful

AB: Looking at the LLVM PR. Seems like patch adds pairwise rather than split in half, why was that done?

AK: The original output had shuffle masks that would change with subtle changes to input

PP: Since this is pairwise would this interact with x86 pattern matching?

YD: We are working on shuffle lowering for horizontal min. Shuffles generated for autovectorized ops are not performance efficient and non-deterministic. Sometimes part of the shuffle mask represent lanes that are discarded. We are having a problem when we can’t make the right selection for either Arm or x86. I am wondering shold we even have a shuffle with non-deterministic output.

AK: Horizontal min reduction

PP: discarded lane indices (FK’s patch)

AK: Neon only has byte shuffles, SVE has other shuffles

YD: https://bugzilla.mozilla.org/show_bug.cgi?id=1887312

PP: what is the incompatibility between x86 and Arm here

YD: For example rotate produces a shuffle that is matches neither of the two styles… Autovectorizer

AK: in Sam’s notes one of the places this is coming from is SPEC, autovectorizer

PP: there is cost model problem

36 changes: 0 additions & 36 deletions simd/2024/SIMD-08-02

This file was deleted.

100 changes: 100 additions & 0 deletions simd/2024/SIMD-08-02.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
![WebAssembly logo](/images/WebAssembly.png)

## Agenda for the August 2 video call of WebAssembly's SIMD Subgroup

- **Dates**: 2024-08-02
- **Times**:
- 3pm-4pm UTC (8am-9am PDT)
- **Location**: *link on calendar invite*
- **Contact**:
- Name: Petr Penzin
- Email: [email protected]


### Registration

Fill out [sign-up form](https://forms.gle/bscWhsD9U4hZEsUV9) to attend.

### Logistics

This meeting will be a Google Meet video conference.

## Agenda items

1. Opening, welcome and roll call
1. Opening of the meeting
1. Introduction of attendees
1. Find volunteers for note taking
1. Adoption of the agenda
1. Proposals and discussions
1. Continue discussion of shuffle masks (https://github.com/WebAssembly/flexible-vectors/issues/66)
1. Closure

## Meeting notes

### Attendees

- Andrew Brown
- Anton Kirilov
- Petr Penzin
- Yury Delendik

## Continue discussion of shuffle masks

https://github.com/WebAssembly/flexible-vectors/issues/66

PP: do you have sources?

YD: no, but I can reproduce it via simple C++ code, something like:

```
unsigned char arr[10000];
unsigned char m = 0; for (i = 0; i < 10000; i++) if (a[i] < m) m = a[i]
```

PP: We should suggest to LLVM how to generate better patterns for these operations

YD: You have to load the mask into a register, perform the shuffle and then post-process the results

PP: Having a few of them in a row would really explode register usage

AK: There is still value to add horizontal reduction to Wasm irrespective of whether we do something about loop tails in LLVM. Not sure about the other three examples.

YD: https://github.com/Microsoft/onnxruntime

PP: Pattern 3 looks like 8->32 bit extend, but the 4 elements it extending it would produce

AK: If we have this flexibility we could broadcast the element. Non-determism w.r.t masks

YD: Maybe we can ask relaxed variant

PP: In flexible vector proposal we can express just the shrinking if we implement per-value length, but it is likely would just move uncertainty somewhere lese

YD: the only way to experiment is to measure ONNX, we should figure out how to run that s a benchmark

YD: Bugzilla bug https://bugzilla.mozilla.org/show_bug.cgi?id=1887312

PP: Maybe this is the source: https://github.com/xenova/transformers.js

YD: Don’t yet know why V8 is faster

AK: Maybe it is matching specific permutation patterns

PP: to summarize maybe disable LLVM, adding operation, and indicate indices to be ignored in the result. Numbers would be good if we are going to present new instruction proposals.

AK: Similar question for flexible vectors, if you get a value and then increase its length

YD: maybe set the lanes to zero explicitly by a SIMD `and` after the shuffle and pattern match on that

PP: would that be too much dataflow analysis for JS engine?

YD: don’t know yet, but we are considering adding an additional pass for dataflow

PP: maybe room for dataflow in other direction by taking lanes out of explicitly zero value if only one value really used

YD: maybe also interleave

PP: LLVM can produce some cheaper pattern, like splat of 4-byte lane, but that goes back to LLVM not having a cost model

https://github.com/llvm/llvm-project/issues/101725 filed

101 changes: 101 additions & 0 deletions simd/2024/SIMD-09-27.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
![WebAssembly logo](/images/WebAssembly.png)

## Agenda for the August 2 video call of WebAssembly's SIMD Subgroup

- **Dates**: 2024-09-27
- **Times**:
- 3pm-4pm UTC (8am-9am PDT)
- **Location**: *link on calendar invite*
- **Contact**:
- Name: Petr Penzin
- Email: [email protected]


### Registration

Fill out [sign-up form](https://forms.gle/bscWhsD9U4hZEsUV9) to attend.

### Logistics

This meeting will be a Google Meet video conference.

## Agenda items

1. Opening, welcome and roll call
1. Opening of the meeting
1. Introduction of attendees
1. Find volunteers for note taking
1. Adoption of the agenda
1. Proposals and discussions
1. Continue discussion of shuffle masks (https://github.com/WebAssembly/flexible-vectors/issues/66)
1. Closure

## Meeting notes

### Attendees

- Andrew Brown
- Anton Kirilov
- Brendan Dahl
- Petr Penzin
- Sergey Rubanov
- Yury Delendik

### Hardware Specialized WebAssembly

AB giving an overview of https://github.com/WebAssembly/design/issues/1528, interested in feedback.

AK is wondering how software emulation would work for the bultins that are not supported on the platform.

AB: there is a detection mechanism whether a builtin a supported natively by the platform.

AK: If developer providing an alternative implementation, it likely would be different from implementation that uses the accelerative version, would that be an issue

AB: As an example, XNNPACK would provide different kernel implementations based on hardware support

AK: Would this be similar to high-level API?

AB: The CG discussion is leaning towards limiting the size of the builtins, I personally think that is maybe OK to try that, at least explore it

AK: Another question - maintaining builtins database, if we worry about the speed of CG proposals, this might be similar

AB: That might be a concern, though for the sake of trying this, we might want to lift process restrictions and let engines adds builtins

AK: Centralized process would reduce the risk of multiple builtins with slightly different semantics

AB: The proposal includes a way to ensure that fallback code is doing what is expected

YD: We have experience with a builtin implementation already where we load imported functions and then substitute with native implementation with a fallback path, that is pretty much identical to the proposal. The downside is that it is hard for developers to rely on fast builtin functions. This is not exposed to the web, only on extension level, look up mozIntGemm, also ticket: https://bugzilla.mozilla.org/show_bug.cgi?id=1720747

AK: From our experience with optimized implementations, addition of new instructions is a bit easier for developers to target.

AK: For the libraries required as native implementation there is going to be an issue with integration, as engines are implemented differently

PP: This is going to be an issue for long builtins and less for the ones producing a single instruction

YD: We need to have the builtins integrated into our compilation pipelines to integration with register allocation, etc

AK: Couple thoughts. The registry should be machine-readable, which I think you have already. This can be even used to add assembly sequences to that, but maybe at a later point. Assembly templates that proposal discusses might help.

PP: Worth (again) mentioning that CG process plays a role, and maybe we need to improve that to some extent. On the other hand, opcode space is so full it is worth to relieve some pressure there.

AK: What is the deprecation process? I think that is one of the motivations for the CG process.

YD, AB: fallback _is_ the depreciation process

YD: Who would be the authority to add/remove the builtins?

PP: There is a case to be made of having more than one authority/registry

AB: Builtins subgroup, which would figure that out eventually?

YD: Can we borrow from JS builtins, maybe

AB - asks BD and SR for any thoughts

BD is curious about the tool integration story, AK suggests function multiversioning.

YD: JS strings is the going be similar to this proposal

AK: Bulk memory ops could’ve been implemented via this style of proposal

67 changes: 67 additions & 0 deletions simd/2024/SIMD-11-22.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
![WebAssembly logo](/images/WebAssembly.png)

## Agenda for the November 22 video call of WebAssembly's SIMD Subgroup

- **Dates**: 2024-11-22
- **Times**:
- 3pm-4pm UTC (8am-9am PDT)
- **Location**: *link on calendar invite*
- **Contact**:
- Name: Petr Penzin
- Email: [email protected]


### Registration

Fill out [sign-up form](https://forms.gle/bscWhsD9U4hZEsUV9) to attend.

### Logistics

This meeting will be a Google Meet video conference.

## Agenda items

1. Opening, welcome and roll call
1. Opening of the meeting
1. Introduction of attendees
1. Find volunteers for note taking
1. Adoption of the agenda
1. Proposals and discussions
1. Relaxed SIMD trunc NaN semantics
1. Closure

## Meeting notes

### Attendees

- Yury Delendik
- Evan Nemerson
- Brendan Dahl
- Petr Penzin

https://github.com/WebAssembly/relaxed-simd/pull/144 and https://github.com/WebAssembly/relaxed-simd/pull/140

YD: this was passing on some particular test because liftoff was producing just the right output, neither turbofan, nor liftoff pass this test

YD: 140 is adding alternative values produced by other engines, AR in 144 said that was rather arbitrary

EN: can we just accept any value when a NaN value is passed

PP: converting NaN to int is not a valid operation, should that be an error?

YD: if we tighten the semantics (i.e. raise error) we get non-relaxed version

EN: what is the difference between relaxed and strict?

YD: number of operations

PP: if you sanitize the inputs you end up with more instructions

EN: Can we consider this behavior undefined?

PP: spec really tries to avoid undefined, I do support the idea that we might not need this operation

YD: maybe we should check the use cases are in the wild, xnnpack, onnx runtime. Will open an issue, want to know what Andreas has to say about it

BD: xnnpack doesn’t seem to use this particular intrinsic (and it doesn't use autovectorization)