Remix.run Logo
Show HN: A small programming language where everything is pass-by-value(github.com)
45 points by jcparkyn 4 hours ago | 23 comments

This is a hobby project of mine that I started a few years ago to learn about programming language implementation. It was created 95% without AI, although a few recent commits include code from Gemini CLI.

I started out following Crafting Interpreters, but gradually branched off that until I had almost nothing left in common.

Tech stack: Rust, Cranelift (JIT compilation), LALRPOP (parser).

Original title: "A small programming language where everything is a value" (edited based on comments)

augusteo an hour ago | parent | next [-]

The threading story here is what grabbed my attention. Pass-by-value with copy-on-write means you get data-race immunity without any locks or channels. You just pass data to a thread and mutations stay local. That's a genuinely useful property.

I've worked on systems where we spent more time reasoning about shared state than writing actual logic. The typical answer is "just make everything immutable" but then you lose convenient imperative syntax. This sits in an interesting middle ground.

Curious about performance in practice. Copy-on-write is great until you hit a hot path that triggers lots of copies. Have you benchmarked any real workloads?

rao-v 3 minutes ago | parent | next [-]

Why don’t we just do this by default for threading in most languages? It’s pretty rare for me to actually want to do memory sharing while threading (mostly because of the complexity)

jcparkyn 13 minutes ago | parent | prev | next [-]

> Have you benchmarked any real workloads?

Nothing "real", just the synthetic benchmarks in the ./benchmarks dir.

Unnecessary copies are definitely a risk, and there's certain code patterns that are much harder for my interpreter to detect and remove (e.g. in the nbodies example with modifying dicts stored in arrays).

The other big performance limitation with my implementation is just the cost of atomic reference counting, and that's the main tradeoff versus deep cloning to pass between threads. There would definitely be room to improve this with better reference counting optimizations though.

sheepscreek 39 minutes ago | parent | prev | next [-]

Hmm this is a bit like peeling a banana only to throw the banana and eat the peel. Pass by value reduces the true benefit of copy-on-write.

Use immutable pass by reference. Make a copy only if mutability is requested in the thread. This makes concurrent reads lock-free but also cuts down on memory allocations.

doug-moen 14 minutes ago | parent | next [-]

I think that what you are calling "immutable pass by reference" is what the OP is calling "pass by value". See, when used abstractly, "pass by value" means that the argument is passed as a value, hence it is immutable and the callee can't mutate it. One way to implement this is by copying the data that represents the value. In the OP's language, and in many other languages that work this way, instead of copying the data, we implement "pass by value" by incrementing the reference count and passing a pointer to the original data. These differing implementations provide the same abstract semantics, but differ in performance.

jcparkyn 19 minutes ago | parent | prev [-]

> Use immutable pass by reference. Make a copy only if mutability is requested in the thread.

This is essentially what Herd does. It's only semantically a pass by value, but the same reference counting optimizations still apply.

In fact, Herd's approach is a bit more powerful than this because (in theory) it can remove the copy entirely if the caller doesn't use the old value any more after creating the thread. In practice, my optimizations aren't perfect and the language won't always detect this.

The big downside is that we have to use atomic reference counts for _everything_. From memory this was about a 5-15% performance hit versus non-atomic counters, though the number might be higher if other bottlenecks were removed.

jagged-chisel 12 minutes ago | parent | prev [-]

Pass-by-value is already a copy.

jcparkyn 8 minutes ago | parent [-]

It's only semantically a pass-by-value, in reality a reference is passed and the data is only copied if needed (i.e. value is mutated while shared).

fjfaase 2 hours ago | parent | prev | next [-]

I have implemented similar behavior in some of my projects. For one, I also have also implemented 'cursors' that point to some part of a value bound to a variable and allow you to change that part of the value of the variable. I have used this to implement program transformations on abstract parse (syntax) trees [1]. I also have implemented a dictionary based on a tree where only part of the tree is modified that needs to be modified [2]. I have also started working on a language that is based on this, but also attempts to add references with defined behavior [3].

[1] https://github.com/FransFaase/IParse/?tab=readme-ov-file#mar...

[2] https://www.iwriteiam.nl/D1801.html#7

[3] https://github.com/FransFaase/DataLang

discarded1023 2 hours ago | parent | prev | next [-]

At the risk of telling you what you already know and/or did not mean to say: not everything can be a value. If everything is a value then no computation (reduction) is possible. Why? Because computation stops at values. This is traditional programming language/lambda calculus nomenclature and dogma. See Plotkin's classic work on PCF (~ 1975) for instance; Winskel's semantics text (~ 1990) is more approachable.

Things of course become a lot more fun with concurrency.

Now if you want a language where all the data thingies are immutable values and effects are somewhat tamed but types aren't too fancy etc. try looking at Milner's classic Standard ML (late 1970s, effectively frozen in 1997). It has all you dream of and more.

In any case keep having fun and don't get too bogged in syntax.

jcparkyn 2 hours ago | parent | next [-]

Thanks, some interesting reading there that I will check out (I wasn't aware of PCF). Perhaps I should've used more precise wording: "All types are value types".

> Standard ML [...] It has all you dream of and more

The main thing here that's missing in Standard ML (and most other functional languages) is the "mutable" part of "mutable value semantics" - i.e., the ability to modify variables in-place (even nested parts of complex structures) without affecting copies. This is different from "shadowing" a binding with a different value, since it works in loops etc.

doug-moen 2 hours ago | parent | prev [-]

I am unable to extract any meaning from your post. You appear to be making a general claim: it is impossible to design a programming language where everything is a value. You at least admit that "data thingies" can be values. Are you claiming that it is not possible for functions to be values? (If we assume that the argument and the result of a function call is a value, then this would mean higher order functions are impossible, for example.) If not that, then what? Please give a specific example of something that can never be a value in any programming language that I care to design.

gf000 an hour ago | parent [-]

I think parent means it from a lambda calculus perspective. If you only have values at an AST level, then you only have a tree of.. values, like an XML document.

You can apply meaning to a particular shape of that tree which could be executed, but then you basically just added another layer before you parse your AST that becomes executable.

ekipan 2 hours ago | parent | prev | next [-]

(Edit: in the old post title:) "everything is a value" is not very informative. That's true of most languages nowadays. Maybe "exclusively call-by-value" or "without reference types."

I've only read the first couple paragraphs so far but the idea reminds me of a shareware language I tinkered with years ago in my youth, though I never wrote anything of substance: Euphoria (though nowadays it looks like there's an OpenEuphoria). It had only two fundamental types. (1) The atom: a possibly floating point number, and (2) the sequence: a list of zero or more atoms and sequences. Strings in particular are just sequences of codepoint atoms.

It had a notion of "type"s which were functions that returned a boolean 1 only if given a valid value for the type being defined. I presume it used byte packing and copy-on-write or whatever for its speed boasts.

https://openeuphoria.org/ - https://rapideuphoria.com/

p1necone 2 hours ago | parent | next [-]

> It had a notion of "type"s which were functions that returned a boolean 1 only if given a valid value for the type being defined.

I've got a hobby language that combines this with compile time code execution to get static typing - or I should say that's the plan, it's really just a tokenizer and half of a parser at the moment - I should get back to it.

The cool side effect of this is that properly validating dynamic values at runtime is just as ergonomic as casting - you just call the type function on the value at runtime.

jcparkyn an hour ago | parent | prev [-]

Thanks, I updated the post title based on this and another comment. Thanks for the pointer to Euphoria too, looks like an interesting language with a lot of similar ideas.

jbritton an hour ago | parent | prev | next [-]

The article mentions shallow copy, but does this create a persistent immutable data structure? Does it modify all nodes up the tree to the root?

jcparkyn 40 minutes ago | parent [-]

Yes, if you modify a nested dict/list entry, all nodes above it will be cloned. Here's an example:

  x = [1, [2]];
  var y = x;
  set y.[0] = 3; // clones the outer array, keeps a reference to inner array
  set y.[1].[0] = 4; // clones the inner array here. Outer array is now exclusive so it doesn't need another clone.

  var z = x;
  set z.[1].[0] = 4; // clones both arrays at once
rvba 2 hours ago | parent | prev | next [-]

> In herd, everything is immutable unless declared with var

So basucally everything is var?

jcparkyn 2 hours ago | parent [-]

I'm not sure if I understand the question?

There are two ways to define a variable binding:

    x = 1; // declares x as immutable
    var y = 2; // declares y as mutable
The "default" behaviour (if no keyword is used) is to define a new immutable variable.
drnick1 2 hours ago | parent | prev [-]

Small programming language with everything passed by value? You reinvented C?

jcparkyn 2 hours ago | parent [-]

Not everything in C is pass-by-value. Sure, you can argue that a pointer itself is passed by value, but the data it points to is definitely not.

globalnode 19 minutes ago | parent [-]

cool project. can you take the address of a variable in some way? i.e. implement your own pointers if its really really needed?