Remix.run Logo
solomonb 4 hours ago

I'm biased by my preferred style of programming languages but I think that pure statically typed functional languages are incredibly well suited for LLMs. The purity gives you referential transparency and static analysis powers that the LLM can leverage to stay correctly on task.

The high level declarative nature and type driven development style of languages like Haskell also make it really easy for an experienced developer to review and validate the output of the LLM.

Early on in the GPT era I had really bad experiences generating Haskell code with LLMs but I think that the combination of improved models, increased context size, and agentic tooling has allowed LLMs to really take advantage of functional programming.

eru 4 hours ago | parent | next [-]

I'm inclined to agree with you in principle, but there's much, much less Haskell examples in their training corpus than for JavaScript or Python.

tikhonj 19 minutes ago | parent | next [-]

From what I've heard—and in my own very limited experiments—LLMs are much better at less popular languages than I would have expected. I've had good results with OCaml, and I've talked to people who've had good results with Haskell and even Unison.

I've also seen multiple startups that have had some pretty impressive performance with Lean and Rocq.

My current theory is that as long as the LLM has sufficiently good baseline performance in a language, the kind of scaffolding and tooling you can build around the pure code generation will have an outsize effect, and languages with expressive type systems have a pretty direct advantage there: types can constrain and give immediate feedback to your system, letting you iterate the LLM generation faster and at a higher level than you could otherwise.

I recently saw a paper[1] about using types to directly constrain LLM output. The paper used TypeScript, but it seems like the same approach would work well with other typed languages as well. Approaches like that make generating typed code with LLMs even more promising.

Abstract:

> Language models (LMs) can generate code but cannot guarantee its correctness often producing outputs that violate type safety, program invariants, or other semantic properties. Constrained decoding offers a solution by restricting generation to only produce programs that satisfy user-defined properties. However, existing methods are either limited to syntactic constraints or rely on brittle, ad hoc encodings of semantic properties over token sequences rather than program structure.

> We present ChopChop, the first programmable framework for constraining the output of LMs with respect to semantic properties. ChopChop introduces a principled way to construct constrained decoders based on analyzing the space of programs a prefix represents. It formulates this analysis as a realizability problem which is solved via coinduction, connecting token-level generation with structural reasoning over programs. We demonstrate ChopChop's generality by using it to enforce (1) equivalence to a reference program and (2) type safety. Across a range of models and tasks, ChopChop improves success rates while maintaining practical decoding latency.

[1]: https://arxiv.org/abs/2509.00360

solomonb 4 hours ago | parent | prev | next [-]

You are right that there is significantly more Javascript in the training data, but I can say from experience that I'm a little shocked at how well opus 4.5 has been for me writing Haskell. I'm fairly particular and I end up re-writing a lot of code for style reasons but it can often one shot an acceptable solution that is mostly inline with the rest of the code base.

joelthelion 2 hours ago | parent | prev | next [-]

For the little Haskell I've done with llms, I can tell you they're not bad at it.

Actually, Haskell was a bit too hard for me on my own for real projects. Now with AI assistants, I think it could be a great pick.

energy123 3 hours ago | parent | prev | next [-]

True for now, but probably not a durable fact. Synthetic data pipelines should be mostly invariant to the programming language, as long as the output is correct. If anything the additional static analysis makes it more amenable to synthetic data generation.

eru 3 hours ago | parent [-]

> Synthetic data pipelines should be mostly invariant to the programming language, as long as the output is correct.

Well, you can adapt your PHP producing pipeline to produce Haskell code that is correct in the sense of solving the problem at hand, but getting it to produce idiomatic code is probably a lot harder.

tikhonj 16 minutes ago | parent [-]

I think the trick with Haskell is that you can write types in such a way that the APIs that get generated are idiomatic and designed well. The implementations of individual functions might be messy or awkward, but as long as those functions are relatively small—which is how I tend to write my non-AI-based Haskell code anyhow!—it's not nearly as important.

kstrauser 4 hours ago | parent | prev [-]

And yet the models I've used have been great with Rust, which pales in lines of code to JavaScript (or Python or PHP or Perl or C or C++).

eru 4 hours ago | parent [-]

I've also had decent experiences with Rust recently. I haven't done enough Haskell programming in the AI era to really say.

But it could be that different programming languages are a bit like different human languages for these models: when they have more than some threshold of training data, they can express their general problem solving skills in any of them? And then it's down to how much the compiler and linters can yell at them.

For Rust, I regularly tell them to make `clippy::pedantic` happy (and tell me explicitly when they think that the best way to do that is via an explicit ignore annotation in the code to disable a certain warning for a specific line).

Pedantic clippy is usually too.. pedantic for humans, but it seems to work reasonably well with the agents. You can also add clippy::cargo which ain't included in clippy::pedantic.

solomonb 4 hours ago | parent [-]

> But it could be that different programming languages are a bit like different human languages for these models: when they have more than some threshold of training data, they can express their general problem solving skills in any of them? And then it's down to how much the compiler and linters can yell at them.

I think this is exactly right.

jaggederest 2 hours ago | parent [-]

Exactly my opinion - I think the more you lock down the "search space" by providing strong and opinionated tooling, the more LLMs perform well. I think of it as starting something like a simulated annealing trying to get a correct solution, versus the same simulated annealing run while using heuristics and bounds to narrow the solution space

keeda 2 hours ago | parent | prev [-]

It's not just your bias, I too have found great success with a functional programming style, even from the earliest days of ChatGPT. (Not Haskell, but JS, which the models were always good at.)

I think the underlying reason is that functional programming is very conducive to keeping the context tight and focused. For instance, most logic relevant to a task tends to be concentrated in a few functions and data structures across a smallish set of files. That's all you need to feed into the context.

Contrast that with say, Java, where the logic is often spread across a deep inheritance hierarchy located in bunch of separate files. Add to that large frameworks that encapsulate a whole lot of boilerplate and bespoke logic with magic being injected from arbitrary places via e.g. annotations. You'd need to load all of those files (or more likely, simply the whole codebase) and relevant documentation to get accurate results. And even then the additional context is not just extraneous and expensive, but also polluted with irrelevant data that actually reduces accuracy.

A common refrain of mine is that for the best results, you have to invest a lot of time experimenting AND adapt yourself to figure out what works best with AI. In my case, it was gradually shifting to a functional style after spending my whole career writting OO code.