[Feature Request] Dawn C++ WebGPU backend #837

loretoparisi · 2023-04-07T16:10:18Z

Today Chrome released WebGPU support in Chrome Beta.
The Google's Dawn project is a C++ standalone implementation of the WebGPU. It enables support of WebGPU in other libraries, by example this WIP are NodeJS binding to Dawn, that would enable - in theory - WebGPU in Node.
So it should be possible to add Dawn as GPU backend to Llama/GGML C++ math operations.

LiliumSancta · 2023-04-07T19:19:46Z

Here is an implementation of sd using webgpu in chrome, not dawn. I found it interesting, but I don't know if it's useful for llama.cpp https://github.com/mlc-ai/web-stable-diffusion

jon-chuang · 2023-04-11T15:53:48Z

Agreed that chrome makes more sense. If you want to run on GPU on local, you should just run pytorch. The whole point of llama.cpp is that you have no deps.

I think that running on the user's browser is a very interesting idea, but in practice, it may be slow. Btw, webGPU API is constrained compared with CUDA. I wonder if you will get good performance.

kadogo · 2023-04-17T18:15:35Z

Hello,

I did a test with their implementation https://github.com/mlc-ai/web-llm and in the feeling I think that the speed is maybe a little slower than with llama.cpp but I only have an Iris Xe.

What it is interesting is that it's recognize my intel card, I don't think it's possible easily with the basic pytorch.

It could be interesting to see it would be possible to use GPU and CPU together in llama.cpp? Even as an option, it could be nice if it can win a few tokens.

loretoparisi · 2023-04-17T21:17:36Z

Hello,

I did a test with their implementation https://github.com/mlc-ai/web-llm and in the feeling I think that the speed is maybe a little slower than with llama.cpp but I only have an Iris Xe.

What it is interesting is that it's recognize my intel card, I don't think it's possible easily with the basic pytorch.

It could be interesting to see it would be possible to use GPU and CPU together in llama.cpp? Even as an option, it could be nice if it can win a few tokens.

I have tested it locally as well. It works pretty fast with a 4GB 4bit quantized vicuna 7B model. Web-llm is using Apache TVM unity based on the IRTensor (IRModule), compiled with emscripten and WASM for the SentencePiece tokenizer. This will natively support WebGPU on different devices, but it's technologically challenging, let's consider that web-llm is from devs involved in TVM Unity development. Stack and components involved are a lot and it's far to be simple as GGML idea was.

loretoparisi · 2023-04-21T19:42:58Z

It's worth to note this naive GPT implementation in vanilla JavaScript that support WebGPU.
https://github.com/0hq/WebGPT

sw · 2023-04-23T12:55:10Z

Llama.cpp specifically targets the CPU, so it's unlikely such a dependency will be added, but see the discussion in #915.

audiovention · 2023-10-17T14:05:54Z

I've done a small first step towards that:
ggerganov/ggml#585

ei-grad · 2023-10-30T14:52:48Z

Would WebGPU solve the 32-bit memory issue since most of layers/computations would come to the GPU memory? #97

github-actions · 2024-04-11T01:06:53Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

loretoparisi · 2024-09-26T11:46:29Z

@ggerganov Hello, thanks to your GGUF release of Llama-3.2-1B-Instruct-Q4_K_M-GGUF that is just 800MB, and it can be easily sharded to few chunks, and there is no need of WASM64, could be worth to attempt to load to WASM simd in the browser? Most browsers now have infact support to simd and thread.

sw added enhancement New feature or request performance Speed related topics labels Apr 23, 2023

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Dawn C++ WebGPU backend #837

[Feature Request] Dawn C++ WebGPU backend #837

loretoparisi commented Apr 7, 2023

LiliumSancta commented Apr 7, 2023

jon-chuang commented Apr 11, 2023 •

edited

Loading

kadogo commented Apr 17, 2023

loretoparisi commented Apr 17, 2023

loretoparisi commented Apr 21, 2023

sw commented Apr 23, 2023

audiovention commented Oct 17, 2023

ei-grad commented Oct 30, 2023 •

edited

Loading

github-actions bot commented Apr 11, 2024

loretoparisi commented Sep 26, 2024

[Feature Request] Dawn C++ WebGPU backend #837

[Feature Request] Dawn C++ WebGPU backend #837

Comments

loretoparisi commented Apr 7, 2023

LiliumSancta commented Apr 7, 2023

jon-chuang commented Apr 11, 2023 • edited Loading

kadogo commented Apr 17, 2023

loretoparisi commented Apr 17, 2023

loretoparisi commented Apr 21, 2023

sw commented Apr 23, 2023

audiovention commented Oct 17, 2023

ei-grad commented Oct 30, 2023 • edited Loading

github-actions bot commented Apr 11, 2024

loretoparisi commented Sep 26, 2024

jon-chuang commented Apr 11, 2023 •

edited

Loading

ei-grad commented Oct 30, 2023 •

edited

Loading