Remix.run Logo
overgard 7 hours ago

I asked Codex to write some unit tests for Redux today. At first glance it looked fine, and I continued on. I then went back to add a test by hand, and after looking more closely at the output there were like 50 wtf worthy things scattered in there. Sure they ran, but it was bad in all sorts of ways. And this was just writing something very basic.

This has been my experience almost every time I use AI: superficially it seems fine, once I go to extend the code I realize it's a disaster and I have to clean it up.

The problem with "code is cheap" is that, it's not. GENERATING code is now cheap (while the LLMs are subsidized by endless VC dollars, anyway), but the cost of owning that code is not. Every line of code is a liability, and generating thousands of lines a day is like running up a few thousand dollars of debt on a credit card thinking you're getting free stuff and then being surprised when it gets declined.

acemarke an hour ago | parent | next [-]

Hi, I'm the primary Redux maintainer. I'd love to see some examples of what got generated! (Doubt there's anything we could do to _influence_ this, but curious what happened here.)

FWIW we do have our docs on testing approaches here, and have recommended a more integrated-style approach to testing for a while:

- https://redux.js.org/usage/writing-tests

acedTrex 7 hours ago | parent | prev | next [-]

I've always said every line is a liability, its our job to limit liabilities. That has largely gone out the window these days.

elgenie 4 hours ago | parent | next [-]

No code is as easy to maintain as no code.

No code runs as fast as no code.

chr15m 5 hours ago | parent | prev | next [-]

> every line is a liability, its our job to limit liabilities.

Hard agree!

cs_sorcerer 5 hours ago | parent | prev | next [-]

Followed by even better is code is no code, and best is deleting code.

It’s one of those things which has always strikes me funny about programming is how less usually really is more

nomel 6 hours ago | parent | prev [-]

The only people I've known that share this perspective are those that hate abstraction. Going back to their code, to extend it in some way, almost always requires a rewrite, because they wrote it with the goal of minimum viable complexity rather than understanding the realities of the real world problem they're solving, like "we all know we need these other features, but we have a deadline!"

For one off, this is fine. For anything maintainable, that needs to survive the realities of time, this is truly terrible.

Related, my friend works in a performance critical space. He can't use abstractions, because the direct, bare metal, "exact fit" implementation will perform best. They can't really add features, because it'll throw the timing of others things off to much, so usually have to re-architect. But, that's the reality of their problem space.

jahsome 3 hours ago | parent | next [-]

I don't see how the two are related, personally. I'm regularly accused of over-abstraction specifically because I aspire to make each abstraction do as little as possible, i.e. fewest lines possible.

nomel 3 hours ago | parent [-]

I call that lasagna code! From what I've seen, developers start with spaghetti, overcompensate with lasagna, then end up with some organization more optimized for the human, that minimizes cognitive load while reading.

To me, abstraction is an encapsulation of some concept. I can't understand how they're practically different, unless you encapsulate true nonsense, without purpose or resulting meaning, which I can't think of an example of, since humans tend to categorize/name everything. I'm dumb.

johnmwilkinson 6 hours ago | parent | prev [-]

I believe this is conflating abstraction with encapsulation. The former is about semantic levels, the later about information hiding.

nomel 6 hours ago | parent [-]

Maybe I am? How is it possible to abstract without encapsulation? And also, how is it possible to encapsulate without abstracting some concept (intentionally or not) contained in that encapsulation? I can't really differentiate them, in the context of naming/referencing some list of CPU operations.

Retric 5 hours ago | parent [-]

> How is it possible to abstract without encapsulation.

Historically pure machine code with jumps etc lacked any from of encapsulation as any data can be accessed and updated by anything.

However, you would still use abstractions. If you pretend the train is actually going 80.2 MPH instead of somewhere between 80.1573 MPH to 80.2485 MPH which you got from different sensors you don’t need to do every calculation that follows twice.

nomel 4 hours ago | parent [-]

I'm using the industry definition of abstraction [1]:

> In software, an abstraction provides access while hiding details that otherwise might make access more challenging

I read this as "an encapsulation of a concept". In software, I think it can be simplified to "named lists of operations".

> Historically pure machine code with jumps etc lacked any from of encapsulation as any data can be accessed and updated by anything.

Not practically, by any stretch of the imagination. And, if the intent is to write silly code, modern languages don't really change much, it's just the number of operations in the named lists will be longer.

You would use calls and returns (or just jumps if not supported), and then name and reference the resulting subroutine in your assembler or with a comment (so you could reference it as "call 0x23423 // multiply R1 and R2"), to encapsulate the concept. If those weren't supported, you would use named macros [2]. Your assembler would used named operations, sometimes expanding to multiple opcodes, with each opcode having a conceptually relevant name in the manual, which abstracted a logic circuit made with named logic gates, consisting of named switches, that shuffled around named charge carriers. Say your code just did a few operations, the named abstraction for the list of operations (which all these things are) there would be "blink_light.asm".

> If you pretend the train is actually going 80.2 MPH instead of somewhere between 80.1573 MPH to 80.2485 MPH which you got from different sensors you don’t need to do every calculation that follows twice.

I don't see this as an abstraction as much as a simple engineering compromise (of accuracy) dictated by constraint (CPU time/solenoid wear/whatever), because you're not hiding complexity as much as ignoring it.

I see what you're saying, and you're probably right, but I see the concepts as equivalent. I see an abstraction as a functional encapsulation of a concept. An encapsulation, if not nonsense, will be some meaningful abstraction (or a renaming of one).

I'm genuinely interested in an example of an encapsulation that isn't an abstraction, and an abstraction that isn't a conceptual encapsulation, to right my perspective! I can't think of any.

[1] https://en.wikipedia.org/wiki/Abstraction_(computer_science)

[2] https://www.tutorialspoint.com/assembly_programming/assembly...

Retric 4 hours ago | parent [-]

> I can't think of any.

Incorrect definition = incorrect interpretation. I edited this a few times but the separation is you can use an abstraction even if you maintain access to the implementation details.

> assembler

Assembly language which is a different thing. Initially there was no assembler, someone had to write one. In the beginning every line of code had direct access to all memory in part because limited access required extra engineering.

Though even machine code itself is an abstraction across a great number of implementation details.

> I don't see this as an abstraction as much as a simple engineering compromise (of accuracy) dictated by constraint (CPU time/solenoid wear/whatever), because you're not hiding complexity as much as ignoring it.

If it makes you feel better consider the same situation with 5 senators X of which have failed. The point is you don’t need to consider all information at every stage of a process. Instead of all the underlying details you can write code that asks do we have enough information to get a sufficiently accurate speed? What is it?

It doesn’t matter if the code could still look at the raw sensor data, you the programmer prefer the abstraction so it persists even without anything beyond yourself enforcing it.

IE: “hiding details that otherwise might make access more challenging”

You can use TCP/IP or anything else as an abstraction even if you maintain access to the lower level implementation details.

nomel 3 hours ago | parent [-]

I genuinely appreciate your response, because there's a good chance it'll result in me changing my perspective, and I'm asking these questions with that intent!

> You are thinking of assembly language which is a different thing. Initially there was no assembler, someone had to write one.

This is why I specifically mention opcodes. I've actually written assemblers! And...there's not much to them. It's mostly just replacing the names given to the opcodes in the datasheet back to the opcodes, with a few human niceties. ;)

> consider the same situation with 5 senators X of which have failed

Ohhhhhhhh, ok. I kind of see. Unfortunately, I don't see the difference between abstraction and encapsulation here. I see the abstraction as being speed as being the encapsulation of a set of sensors, ignoring irrelevant values.

I feel like I'm almost there. I may have edited my previous comment after you replied. My "no procrastination" setting kicked in, and I couldn't see.

I don't see how "The former is about semantic levels, the later about information hiding." are different. In my mind, semantic levels exist as compression and encapsulation of information. If you're saying encapsulation means "black box" then that could make sense to me, but "inaccessible" isn't part of the definition, just "containment".

Retric 3 hours ago | parent [-]

> It's mostly just replacing the names given to the opcodes in the datasheet back to the opcodes

Under the assumption that the input data is properly formatted you can generate machine code. This is however an abstraction which can fail as nothing forces a user to input valid files.

So we have an abstraction without any encapsulation.

visarga an hour ago | parent | prev | next [-]

> "write some unit tests for Redux today"

The equivalent of "draw me a dog" -> not a masterpiece!? who would have thought? You need to come up with a testing methodology, write it down, and then ask the model to go through it. It likes to make assumptions on unspecified things, so you got to be careful.

More fundamentally I think testing is becoming the core component we need to think about. We should not vibe-check AI code, we should code-check it. Of course it will write the actual test code, but your main priority is to think about "how do I test this?"

You can only know the value of a code up to the level of its testing. You can't commit your eyes into the repo, so don't do "LGTM" vibe-testing of AI code, it's walking a motorcycle.

cheema33 4 hours ago | parent | prev | next [-]

This is how you do things if you are new to this game.

Get two other, different, LLMs to thoroughly review the code. If you don’t have an automated way to do all of this, you will struggle and eventually put yourself out of a job.

If you do use this approach, you will get code that is better than what most software devs put out. And that gives you a good base to work with if you need to add polish to it.

overgard 2 hours ago | parent | next [-]

I actually have used other LLMs to review the code, in the past (not today, but in the past). It's fine, but it doesn't tend to catch things like "this technically works but it's loading a footgun." For example, the redux test I was mentioning in my original post, the tests were reusing a single global store variable. It technically worked, the tests ran, and since these were the first tests I introduced in the code base there weren't any issues even though this made the tests non deterministic... but, it was a pattern that was easily going to break down the line.

To me, the solution isn't "more AI", it's "how do I use AI in a way that doesn't screw me over a few weeks/months down the line", and for me that's by making sure I understand the code it generated and trim out the things that are bad/excessive. If it's generating things I don't understand, then I need to understand them, because I have to debug it at some point.

Also, in this case it was just some unit tests, so who cares, but if this was a service that was publicly exposed on the web? I would definitely want to make sure I had a human in the loop for anything security related, and I would ABSOLUTELY want to make sure I understood it if it were handling user data.

timcobb 2 hours ago | parent | prev | next [-]

> you will struggle and eventually put yourself out of a job.

We can have a discussion without the stakes being so high.

summerlight 3 hours ago | parent | prev | next [-]

The quality of generated code does not matter. The problem is when it breaks 2 AM and you're burning thousands of dollars every minutes. You don't own the code that you don't understand, but unfortunately that does not mean you don't own the responsibility as well. Good luck on writing the postmortem, your boss will have lots of question for you.

3kkdd 4 hours ago | parent | prev [-]

Im sick and tired of these empty posts.

SHOW AN EXAMPLE OF YOU ACTUALLY DOING WHAT YOU SAY!

alt187 2 hours ago | parent | next [-]

There's no example because OP has never done this, and never will. People lie on the internet.

timcobb 2 hours ago | parent [-]

I've never done this because i haven't felt compelled to do this because I want to review my own code but I imagine this works okay and isn't hard to set up by asking Claude to set this up for you...

Foreignborn 3 hours ago | parent | prev [-]

these two posts (the parent and then the OP) seem equally empty?

by level of compute spend, it might look like:

- ask an LLM in the same query/thread to write code AND tests (not good)

- ask the LLM in different threads (meh)

- ask the LLM in a separate thread to critique said tests (too brittle, testing guidelines, testing implementation and not out behavior, etc). fix those. (decent)

- ask the LLM to spawn multiple agents to review the code and tests. Fix those. Spawn agents to critique again. Fix again.

- Do the same as above, but spawn agents from different families (so Claude calls Gemini and Codex).

—-

these are usually set up as /slash commands like /tests or /review so you aren’t doing this manually. since this can take some time, people might work on multiple features at once.

akst 5 hours ago | parent | prev | next [-]

ATM I feel like LLM writing tests can be a bit dangerous at times, there are cases where it's fine there are cases where it's not. I don't really think I could articulate a systemised basis for identifying either case, but I know it when I see it I guess.

Like the the other day, I gave it a bunch of use cases to write tests for, the use cases were correct the code was not, it saw one of the tests broken so it sought to rewrite the test. You risking suboptimal results when an agent is dictating its own success criteria.

At one point I did try and use seperate Claude instances to write tests, then I'd get the other instance to write the implementation unaware of the tests. But it's a bit to much setup.

sjsizjhaha 7 hours ago | parent | prev | next [-]

Generating code was always cheap. That’s part of the reason this tech has to be forced on teams. Similar to the move to cloud, it’s the kind of cost that’s only gonna show up later - faster than the cloud move, I think. Though, in some cases it will be the correct choice.

zamalek 7 hours ago | parent | prev [-]

The main issue I've seen is it writing passing tests, the code being correct is a big (and often incorrect) assumption.

0x696C6961 6 hours ago | parent [-]

The majority of devs do the same thing.