Remix.run Logo
tikhonj 2 hours ago

From what I've heard—and in my own very limited experiments—LLMs are much better at less popular languages than I would have expected. I've had good results with OCaml, and I've talked to people who've had good results with Haskell and even Unison.

I've also seen multiple startups that have had some pretty impressive performance with Lean and Rocq.

My current theory is that as long as the LLM has sufficiently good baseline performance in a language, the kind of scaffolding and tooling you can build around the pure code generation will have an outsize effect, and languages with expressive type systems have a pretty direct advantage there: types can constrain and give immediate feedback to your system, letting you iterate the LLM generation faster and at a higher level than you could otherwise.

I recently saw a paper[1] about using types to directly constrain LLM output. The paper used TypeScript, but it seems like the same approach would work well with other typed languages as well. Approaches like that make generating typed code with LLMs even more promising.

Abstract:

> Language models (LMs) can generate code but cannot guarantee its correctness often producing outputs that violate type safety, program invariants, or other semantic properties. Constrained decoding offers a solution by restricting generation to only produce programs that satisfy user-defined properties. However, existing methods are either limited to syntactic constraints or rely on brittle, ad hoc encodings of semantic properties over token sequences rather than program structure.

> We present ChopChop, the first programmable framework for constraining the output of LMs with respect to semantic properties. ChopChop introduces a principled way to construct constrained decoders based on analyzing the space of programs a prefix represents. It formulates this analysis as a realizability problem which is solved via coinduction, connecting token-level generation with structural reasoning over programs. We demonstrate ChopChop's generality by using it to enforce (1) equivalence to a reference program and (2) type safety. Across a range of models and tasks, ChopChop improves success rates while maintaining practical decoding latency.

[1]: https://arxiv.org/abs/2509.00360

torginus an hour ago | parent | next [-]

You might be right, but I think you must take into account that (I think) you're not super familiar with these languages as well, so you might not notice all the warts a programmer with a lot of experience in these langs would, and overrate the skill of the LLM.

Nowadays, I write C# and TS at work, and it's absolutely crazy how much better the LLM is at TS, with almost all code being decent the first try, but with C# I need to do a lot of massaging.

miki123211 an hour ago | parent | prev [-]

As a huge proponent of constrained decoding for LLM reliability, I don't quite think it's the right approach for code. This is because in many programming languages, it is legal to use a function before its declaration. Since this is common in existing code, LLMs will also try to write code that way.