Remix.run Logo
flohofwoe 6 days ago

Tbh, the WASM Component Model is first and foremost an overengineered mess which probably will add more overhead than a handwritten JS shim just because it is so complex.

In the end you'll need to marshall datatypes from one language into another, and that is already a mess between 'native' languages (e.g. a C++ std::string is something entirely different than a Rust or Kotlin String).

So in that hypothetical native WASM DOM API, how do you pass something as simple as a string? Let's say the obvious solution would be a ptr/length pair, but then, what encoding UTF-8? UTF-16? UTF-32? No matter what the solution is, you won't find a data representation that directly matches the string representation in all the languages that compile to WASM, so you'll need to do marshalling anyway before calling that hypothetical WASM DOM API.

And suddenly the current 'low-tech' solution of letting a JS shim extract the string data from the WASM heap and build a JS string before calling into a web API suddenly doesn't look so terrible anymore.

A much more impactful change would be to add more WASM-friendly entry points to web APIs.

For instance there's no reason that WebGPU is so 'Javascript object heavy' or uses strings as enum values except that this is common in other Javascript APIs. If WebGPU had additional "WASM-friendly" functions which use plain numbers (as object handles or enum values) a lot of the marshalling overhead when being called from WASM would simply go away.

lenkite 6 days ago | parent [-]

Calling it an over-engineered mess when it solves a complex problem through well-defined specification and semantics is very poor posturing. The WASM specs are actually FAR easier to read and grok than most of the legacy WebAPI's! Many of these problems are already being solved.

For strings, the rules are:

    The string is encoded as UTF-8.

    The ABI lowers it into a (ptr: u32, len: u32) pair in linear memory.

    The receiving component or adapter then uses the (ptr, len) to read the UTF-8 bytes and convert it back into its own string representation.

Modern languages that already have UTF-8 strings don't need to pay extra costs. Even old languages like Java are moving to UTF-8 now. Compact Java strings are already UTF-8.

No, the JS shim way is the terrible mess that adds a ridiculous amount of overhead and a truck load of hacks to obtain limited performance. Just have a look at the code for wasm-bindgen for the very definition of messy-engineering and awkward workarounds to get things running.