Remix.run Logo
Bytecode VMs in surprising places (2024)(dubroy.com)
48 points by azhenley 3 days ago | 19 comments
chirsz an hour ago | parent | next [-]

SBus peripherals use the Forth language in their PROMs to initialize themselves[1].

[1] https://docs.oracle.com/cd/E19957-01/802-3239-10/sbusandfc.h...

DonHopkins an hour ago | parent [-]

Good call! (Whether it's a directly threaded, indirectly threaded, subroutine threaded, token threaded, Huffman threaded, or string threaded call.)

https://en.wikipedia.org/wiki/Threaded_code#Token_threading

Mitch Bradley created OpenFirmware, which was originally called ForthMacs or Sun Forth, running on Sun 68k, SPARC, and Amiga. It was based on Langton and Perry Forth 83, and has a metacompiler that can target and cross compile to many platforms, word sizes, cpus, and threading models, produce stripped ROMable images, etc.

The metacompiler made it possible to support different target platforms and interpreter styles, running native CPU, Direct Threaded Code, and bytecode like FCode simultaneously.

So the core of the Forth system runs fast native code with 32 bit threaded code field address pointers, and it can also run portable FCode on the ROMS of expansion cards.

The FCode gets re-tokenized at boot/probe time into the live native DTC dictionary.

https://github.com/MitchBradley

https://github.com/MitchBradley/openfirmware

At the time the Forth bytecode was developed, Sun had different instruction sets (68k, SPARC, x86) that they wanted to support with the same peripherals (typically SBus, like the Sun386i "Roadrunner"). But later it became an IEEE standard that other platforms adopted, like PowerPC Macs, and the OLPC.

Interview with Mitch Bradley (he's like the Woz of Forth):

https://web.archive.org/web/20120118132847/http://howsoftwar...

Years before OpenFirmware bytecode, Mitch also developed an extremely portable C-Forth, which uses token threaded interpreter (16 or 32 bit configurable) using a switch statement + a small, hand-rolled FFI built around a fixed-arity argument-marshalling trampoline:

https://github.com/MitchBradley/cforth

OpenFirmware even has its own song:

https://www.youtube.com/watch?v=b8Wyvb9GotM

More on Mitch and OpenFirmware:

https://news.ycombinator.com/item?id=21822840

https://news.ycombinator.com/item?id=33681531

https://news.ycombinator.com/item?id=38689282

magnat 2 hours ago | parent | prev | next [-]

Some other examples:

- ACPI configuration for power management and platform stuff [1]

- Bitcoin transactions [2]

- TrueType fonts [3]

[1] https://wiki.osdev.org/AML

[2] https://en.bitcoin.it/wiki/Script

[3] https://learn.microsoft.com/en-us/typography/opentype/spec/t...

m132 27 minutes ago | parent [-]

Since we're bringing up ACPI, let's not forget about EFI!

https://uefi.org/specs/UEFI/2.10/22_EFI_Byte_Code_Virtual_Ma...

superjan an hour ago | parent | prev | next [-]

How about the infamous iOS hack with a VM implemented in a JBIG2 PDF? https://projectzero.google/2021/12/a-deep-dive-into-nso-zero...

pratikdeoghare 3 hours ago | parent | prev | next [-]

There is one in golang regular expressions https://swtch.com/~rsc/regexp/regexp2.html

I guess that is why you say re.Compile.

rhdunn 3 hours ago | parent | next [-]

That goes back to Ken Thompson's NFA regex interpreter from 1968 [1], [2], [3]. Note: that whole regex series by Russ Cox [4] is great.

[1] https://dl.acm.org/doi/10.1145/363347.363387 -- Programming Techniques: Regular expression search algorithm

[2] https://swtch.com/~rsc/regexp/regexp1.html -- Regular Expression Matching Can Be Simple And Fast

[3] https://swtch.com/~rsc/regexp/regexp2.html -- Regular Expression Matching: the Virtual Machine Approach

[4] https://swtch.com/~rsc/regexp/ -- Implementing Regular Expressions

kqr an hour ago | parent [-]

I second the Russ Cox recommendation. I read that ages ago and that was what made me realise some theory could actually be useful in practice.

pjc50 3 hours ago | parent | prev | next [-]

All regular expressions are deterministic final automata https://en.wikipedia.org/wiki/Deterministic_finite_automaton (finally, a use for my CS course); the extent to which that counts as a virtual machine varies. Some of the regex syntaxes extend it in ways which don't fit in a DFA and do count as a VM; Perl-compatible RE used to be popular (e.g. in Exim).

titzer an hour ago | parent | next [-]

It's easier to construct NFAs directly from regular expression definitions (rather than DFAs) because implementing the choice operator is easier. We can convert from NFA to DFA with worst-case exponential blowup.

anthk 3 hours ago | parent | prev [-]

Inded:

https://wiki.xxiivv.com/site/rewriting.html

sureglymop 3 hours ago | parent | prev [-]

Interesting. Not that surprising that it works like this. But isn't it a little surprising that things like regexes, printf syntax and other DSLs aren't mostly handled and parsed at compile time in 2026?

pjc50 2 hours ago | parent [-]

Kind of language-dependent since regexes are normally specified as strings and most languages are pretty weak at "run this code at compile time". One of the things Rust users are fond of.

C# is in the middle on this one, where specific features get compile-time support and regex is one of them: https://www.devleader.ca/2026/05/03/c-regex-performance-gene...

I have also built a C# source generator myself (XML parser generator), but the developer experience is a bit of a hill to climb compared to what it could be.

ivankelly 3 hours ago | parent | prev | next [-]

Quake had it’s own vm also

self_awareness 3 hours ago | parent | prev | next [-]

RarVM was used in a previous version of the format, newest RAR has removed it, and RarV5 doesn't have a VM.

omeid2 3 hours ago | parent | prev | next [-]

This list is entirely incomplete without mentioning Java Card.

There is a tiny Java Bytecode VM in an insanely large list of places, you can find some of them here:

https://github.com/crocs-muni/javacard-curated-list https://en.wikipedia.org/wiki/Java_Card

anthk 2 hours ago | parent | prev | next [-]

yt-dlp's jsinterp.py

https://jxself.org/compiling-the-trap.shtml

I've got subleq+eforth (https://github.com/howerj/muxleq) running in JS which is dead simple to do. No input but I could output ASCII mapping values to an array.

https://esolangs.org/wiki/Subleq

So, yes. yt-dlp runs propietary Youtube JS code defying the original purpose.

ignoramous 3 hours ago | parent | prev [-]

TikTok shipping XOR cipher'd bytecode & interp is right up there: https://news.ycombinator.com/item?id=34109771

pjc50 3 hours ago | parent [-]

VM for obfuscation is a whole thing. Denuvo has a particularly complicated one https://connorjaydunn.github.io/blog/posts/denuvo-analysis/

Other game examples using VMs not for obfuscation: Z-machine and SCUMM-VM.