Remix.run Logo
Maro 6 days ago

I'm currently working as a hands-off VP, so I don't use LLMs for coding at work, only for emails and document editing. I do use it for my hobby weekend coding stuff, which usually involves writing short 100-1000 LOC Python toy programs for my own education/entertainment. My way-of-working is quite primitive, I have zero integrations, nothing agentic, I just copy/paste with ChatGPT.

For this use-case it's been very useful, it can usually generate close-to-complete solutions, as long as it's one of the major programming languages, and it's a reasonably standard problem. So in general I'm always surprised when people say that LLMs are completely useless for coding --- this is just not true, and I feel sorry for people who shut themselves off from a useful tool.

But even at this small scale, even the best (o3) models sometimes totally fail. Recently I started a series of posts on distributed algorithms [1], and when I was working on the post/code for the Byzantine Generals / Consensus algorithm, o3 --- to my honest surprise --- just totally failed. I tried about 10 different times (both from scratch and describing the incorrect behaviour of its code), also showing it the original Lamport paper, and it just couldn't get it right... even though the toy implementation is just ~100 LOC, the actual algorithm portion is maybe 25 LOC. My hypothesis is that there are very few implementations online, and additionally I find the descriptions of the algorithm a bit vague (interleaves message cascade and decision logic).

[1] https://bytepawn.com/tag/distributed.html