Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic deallocation by FinalizationRegistry does not work for fully synchronous code #3917

Closed
mizdra opened this issue Apr 6, 2024 · 8 comments

Comments

@mizdra
Copy link

mizdra commented Apr 6, 2024

Summary

This is neither a feature request nor a bug report. It is intended to share my experience with the developers of wasm-bindgen and its users.

FinalizationRegistry is an API that calls a callback when an object is deallocated.

wasm-bindgen has an opt-in feature to automatically deallocate WebAssembly memory using this API.

FinalizationRegistry is an effective way to automatically deallocate WebAssembly memory. However, it is not perfect.

Specifically, if you write code that is completely synchronous from the beginning to the end of your program, the automatic deallocation by FinalizationRegistry will not work. And it causes memory leaks.

How to reproduce

Repository: https://github.com/mizdra/repro-wasm-bindgen-weak-ref-memory-leak

src/lib.rs:

use wasm_bindgen::prelude::*;

#[wasm_bindgen]
#[derive(Debug)]
pub struct MyBuffer {
    data: Vec<u8>,
}

#[wasm_bindgen]
impl MyBuffer {
    pub fn new(size: usize) -> MyBuffer {
        let data = vec![0; size];
        MyBuffer { data }
    }
    pub fn get_data(&self) -> Vec<u8> {
        self.data.clone()
    }
}

sync.mjs:

import { MyBuffer } from './wasm/my-buffer.js';

for (let i = 0; i < Number.MAX_SAFE_INTEGER; i++) {
  MyBuffer.new(0xFFFF);
}
$ cargo install --force [email protected]
$ rustup target add wasm32-unknown-unknown
$ cargo build --release --target wasm32-unknown-unknown
$ wasm-bindgen --out-name my-buffer --target nodejs --out-dir wasm --weak-refs ./target/wasm32-unknown-unknown/release/my_buffer.wasm
$ npm i
$ node --wasm-max-mem-pages=3000 sync.mjs
wasm://wasm/00012936:1


RuntimeError: unreachable
    at __rg_oom (wasm://wasm/00012936:wasm-function[38]:0x37ad)
    at __rust_alloc_error_handler (wasm://wasm/00012936:wasm-function[54]:0x38e8)
    at alloc::alloc::handle_alloc_error::hcc94dbd37c0a15ab (wasm://wasm/00012936:wasm-function[59]:0x3925)
    at mybuffer_new (wasm://wasm/00012936:wasm-function[23]:0x341d)
    at MyBuffer.new (/Users/mizdra/src/github.com/mizdra/repro-wasm-bindgen-weak-ref-memory-leak/wasm/my-buffer.js:69:26)
    at file:///Users/mizdra/src/github.com/mizdra/repro-wasm-bindgen-weak-ref-memory-leak/sync.mjs:4:15
    at ModuleJob.run (node:internal/modules/esm/module_job:218:25)
    at async ModuleLoader.import (node:internal/modules/esm/loader:329:24)
    at async loadESM (node:internal/process/esm_loader:28:7)
    at async handleMainPromise (node:internal/modules/run_main:113:12)

Node.js v21.4.0

What happened?

This behavior is due to the FinalizationRegistry cleanup callback being called after the synchronous execution has completed.

For clarity, here is a simpler code.

minimum-repro/sync.mjs:

const registry = new FinalizationRegistry((i) => {
  console.log(`[i=${i}] cleanup callback is called.`);
});
class Foo {
  constructor(i) {
    console.log(`[i=${i}] Foo's constructor is called.`);
    registry.register(this, i);
  }
}
for (let i = 0; i < 3; i++) {
  new Foo(i);
  global.gc();
}

minimum-repro/async.mjs:

import { setTimeout } from 'timers/promises';

const registry = new FinalizationRegistry((i) => {
  console.log(`[i=${i}] cleanup callback is called.`);
});
class Foo {
  constructor(i) {
    console.log(`[i=${i}] Foo's constructor is called.`);
    registry.register(this, i);
  }
}
for (let i = 0; i < 3; i++) {
  new Foo(i);
  global.gc();
  await setTimeout(0);
}
$ node --expose-gc minimum-repro/sync.mjs
[i=0] Foo's constructor is called.
[i=1] Foo's constructor is called.
[i=2] Foo's constructor is called.
[i=2] cleanup callback is called.
[i=1] cleanup callback is called.
[i=0] cleanup callback is called.
$ node --expose-gc minimum-repro/async.mjs
[i=0] Foo's constructor is called.
[i=0] cleanup callback is called.
[i=1] Foo's constructor is called.
[i=1] cleanup callback is called.
[i=2] Foo's constructor is called.
[i=2] cleanup callback is called.

Calling global.gc() causes the Foo class to be garbage-collected. However, the cleanup callback is not called at that moment; the cleanup callback is called after the synchronous execution has completed. If the synchronous execution is interrupted in a loop, as in async.mjs, cleanup callback is called in the loop. However, if the synchronous execution is not interrupted in the loop, as in sync.mjs, cleanup callback is not called in the loop.

In wasm-bindgen, WebAssembly memory is deallocated in the cleanup callback. Therefore, if you write fully synchronous code like sync.mjs, the cleanup callback is not called and memory is not deallocated. And it causes memory leaks.

Why is cleanup callback called after synchronous execution?

This is based on the ECMAScript specification. The cleanup callback does not interrupt the synchronous execution.

https://262.ecma-international.org/14.0/?_gl=1*9k4t1i*_ga*MTY4NzY3NTcxOC4xNzEyMzEwNDI5*_ga_TDCK4DWEPP*MTcxMjMxMDQyOC4xLjAuMTcxMjMxMDQyOC4wLjAuMA..#sec-weakref-invariants

Neither of these actions (ClearKeptObjects or CleanupFinalizationRegistry) may interrupt synchronous ECMAScript execution. Because hosts may assemble longer, synchronous ECMAScript execution runs, this specification defers the scheduling of ClearKeptObjects and CleanupFinalizationRegistry to the host environment.

V8's test cases also support this.

Probably, other major JavaScript engines have the same behavior.

Workaround

There are several workarounds.

1. call the .free() method

You can free memory manually using the .free() method.

import { MyBuffer } from './wasm/my-buffer.js';

for (let i = 0; i < Number.MAX_SAFE_INTEGER; i++) {
  const buffer = MyBuffer.new(0xFFFF);
  buffer.free(); // Add this line
  global.gc();
}

2. Call other asynchronous APIs in the loop

Synchronous execution is interrupted in the loop so that cleanup callback is called.

import { MyBuffer } from './wasm/my-buffer.js';
import { setTimeout } from 'timers/promises';

for (let i = 0; i < Number.MAX_SAFE_INTEGER; i++) {
  MyBuffer.new(0xFFFF);
  await setTimeout(0); // Add this line
  global.gc();
}

Background of writing this document

I first recognized this behavior when I tried to implement automatic deallocation in @resvg/resvg-wasm (thx/resvg-js#318 (comment)). To make sure that automatic deallocation worked, I wrote code that repeatedly called the @resvg/resvg-wasm API using a loop (thx/resvg-js#318 (comment)). That code is completely synchronous. Therefore, it caused a memory leak.

It may be rare to write code that is completely synchronous. In addition, in my example, the code was not production code, but experimental code to check for memory leaks. However that doesn't mean it can't happen in production code. The users of wasm-bindgen may encounter this problem on rare occasions and spend a lot of time trying to figure out the cause. So I have written this document to help them.

Outro

If there is any input from maintainers on this behavior, I would like to know.

@mizdra mizdra changed the title Automatic deallocation by FinalizationRegistry does not work for fully synchronous code. Automatic deallocation by FinalizationRegistry does not work for fully synchronous code Apr 6, 2024
@yisibl
Copy link

yisibl commented Apr 6, 2024

Your survey is impressive, thanks so much for such a thorough test.

@mizdra
Copy link
Author

mizdra commented Apr 7, 2024

Further literature review suggests that a solution to this problem was attempted by FinalizationRegistry.prototype.cleanupSome proposal.

However, that proposal was withdrawn on 2023/9.

According to the meeting logs, it seems that it has been decided to attempt to resolve this problem with WebAssembly's JavaScript Promise Integration (JSPI).

Will this issue be resolved in #3633?

@daxpedda
Copy link
Collaborator

Interesting, if the conclusion is correct, which would seem intuitive to me (its called GC after all), it seems to me that there isn't really anything wasm-bindgen can do, right?

Its not clear to me yet if and how the JS Promise integration will be implemented, especially because its just a stopgap for the context switching proposal and isn't even stable yet (and only implemented in Chrome I believe).

@mizdra
Copy link
Author

mizdra commented Apr 25, 2024

Thank you for your views. My gut feeling is also that there is nothing wasm-bindgen can do.

Its not clear to me yet if and how the JS Promise integration will be implemented, especially because its just a stopgap for the context switching proposal and isn't even stable yet (and only implemented in Chrome I believe).

Thanks. That answer is good enough for now. I will check after the JSPI is implemented to see what it is and if it can solve this problem.

@annabelle
Copy link

It is not always feasible to use async/await, our code uses some synchronous parts and we currently inject a few awaits backed with setTimeout to let the event loop (and therefor the deallocator) some breathing room. This isn't ideal but it's the only thing we currently can do.

What can be done to help fix or reduce the impact of this issue? Appreciate it.

@daxpedda
Copy link
Collaborator

daxpedda commented Jul 8, 2024

What can be done to help fix or reduce the impact of this issue?

The conclusion was that there isn't anything that can really be done here from wasm-bindgens side.
Let me know if this was unclear somehow or if you have an idea what could be done.


I'm gonna go ahead and close the issue as this isn't really actionable.
This issue has a lot of valuable information in it, I'm sure we will get back to it one day when there is something we can actually do about this.

@daxpedda daxpedda closed this as not planned Won't fix, can't repro, duplicate, stale Jul 8, 2024
@annabelle
Copy link

Would like to understand whose side something can be done from then if not wasm-bindgen and where the issue could be resolved before closing. Appreciated. For us it causes memory to be over 10 times what is necessary. If you have already closed then same question.

@daxpedda
Copy link
Collaborator

daxpedda commented Jul 8, 2024

I believe this has been sufficiently answered in the OP (see the "Workaround" section) and follow-up posts.
What part is unclear?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants