Remix.run Logo
akersten an hour ago

> The code it generated was awful. The kind of garbage that people who don’t know any better would ship: it looked right and it worked. But it was instantly a maintenance dead end.

In the Tailwind thread the other day I was explicitly told that the intended experience of many frameworks is "write-only code" so maybe this is just the way of the future that we have to learn to embrace. Don't worry how it's all hooked up, if it works it works and if it stops working tell the AI to fix it.

It's kind of liberating I guess. I'm not sure if I've reached AI nirvana on accepting this yet, but I do think that moment is close.

simmerup an hour ago | parent | next [-]

The problem is it’s impossibly hard to test all the edge cases

Which is probably why so many random buttons in microsoft/apple/spotify just stop working once you get off the beaten path or load the app in some state which is slightly off base

marcosdumay 20 minutes ago | parent | next [-]

The problem is worse than that.

The number of edge cases in a software is not fixed at all. One of the largest markers of competence in software development is being able to keep them at minimum, and LLMs tend to make that number higher than humanely possible.

disgruntledphd2 an hour ago | parent | prev [-]

Yeah, the biggest thing I've noticed from LLMs is that large tech products now have even more bugs. Turns out the humans weren't so bad after all...

louiereederson an hour ago | parent | next [-]

I'm wondering if companies are 'diverting' engineering resources from core products to AI products with the view that the former are legacy. Kind of two sides of the same coin though.

michaelcampbell an hour ago | parent | prev [-]

> Turns out the humans weren't so bad after all...

The people pushing AI _over_ humans never thought they were. They just don't care about 'good' or 'bad', only 'time-to-market'. A bad app making money is better than a good one that isn't deployed yet. And who cares about anything past the end of the quarter? That's the next guy's problem.

an hour ago | parent [-]
[deleted]
giancarlostoro an hour ago | parent | prev [-]

Easy, have Claude review the code, tell it to be critical and that it needs to be easier to understand, follow Clean Code, SOLID principles and best practices. Lie to it, say you got this from a Junior developer, or "review it as if you were a Staff Level Engineer reviewing Junior code" the models can write better code, just nobody tells them to.

HappySweeney an hour ago | parent | next [-]

Code review is the main thing I use LLMs for. I have found it to be remarkably candid when you tell it the code came from another LLM (even name it). I was running Kimi K2.6 Q4 locally, seeing if it could SIMD a bit-matrix transpose function, and it was slow enough that I would paste its thinking into Gemini every few minutes. Gemini was savage.

datsci_est_2015 5 minutes ago | parent [-]

> Gemini was savage.

Humorously, this could be the result of LLMs vacuuming up all the sentiment on the web that the code that LLMs produce is trash-tier.

marcosdumay 25 minutes ago | parent | prev | next [-]

Lol, the only thing worse than a junior developer following Clean Code and SOLID has to be an LLM messing with code so it looks like it follows.

giancarlostoro 6 minutes ago | parent [-]

Clean Code has its really "meh" areas, but the core idea and spirit of it is sound, heck Python's best guide is PEP-8 if you follow that, it forces you to write much better Python code.

In terms of "junior dev following" it would be the model trying to think and write it as a Senior or Staff Level engineer would.

kenjackson 41 minutes ago | parent | prev | next [-]

This is it. I've had a similar experience in just playing around I asked it to clean up some code it wrote to increase maintainability and readability by humans. After a few iterations it had generated quite solid code. It also broke the code a couple of times along the way. But it does get me thinking that these pipelines with agents doing specific tasks makes a lot of sense. One to design and architect, one to implement, one to clean, one to review, one to test (actually there's probably a bunch of different agents for testing -- testing perf/power, that it matches the requirements/spec, matches the design, is readable/maintainable, etc...).

giancarlostoro 36 minutes ago | parent [-]

I built GuardRails after some frustrations with Beads which I love, and this whole exchange made me realize, because I have "gates" after tasks, I could add a "Review the code" type of gate, and probably get insanely better output, I already get reasonably good output because I spec out the requirements beforehand, that's the other thing, if you can tell the LLM HOW to build before it does, you will have better output.

enraged_camel 32 minutes ago | parent | prev [-]

Even better, if you have access to multiple models, tell it you got the code from another AI agent.

I did an experiment on this a few weekends ago and Codex for example was a lot more adversarial and thorough in its review when given Claude-authored code compared to when given the same code with "I wrote this, can you review it?"

giancarlostoro 31 minutes ago | parent [-]

If it's within its context window, it will know you're lying, so either compact or start a new chat (don't do this on Claude, it dings your usage, always has).