Against vibes: When is a generative model useful

andai 15 minutes ago | parent | next [-]

> What is the cost of verifying the generated artifact meets requirements vs. a directly produced artifact? This is mostly a function of the task and the user, but also the generative model.

So this is the fun one for programming.

I let AI agents do some programming on my codebases, but then I had to spend more time catching up with their changes.

So first I was bored waiting for them to finish, and then I was confused and frustrated making sense of the result.

Whereas, when I am asking AI small things like "edit this function so it does this instead", and accepting changes manually, my mental model stays synced the whole time. And I can stay active and in flow.

(Also for such fine grained tasks, small fast cheap models are actually superior because they allow realtime usage. Even small latency makes a big difference.)

▲

smilindave26 2 hours ago | parent | prev | next [-]

> For almost all software I write, I do care about the process. I’m typically designing software as part of research, and me doing the design and implementation work creates knowledge that I will then share.

Similar here. For a lot of software I write, I don't really know what the essential "abstraction" I need is until I'm actively writing it. The answers, when I get them right, look obvious in retrospect. Sometimes, starting with Claude Code, I can get there, but my mindset is that I'm using this tool to generate software that helps me immerse myself in the problem space. It's a different pace to the process - sometimes it speeds me up, sometimes I end up taking bad concepts a lot further than I normally would before getting to the better path

	▲	neonstatic an hour ago \| parent [-]
		I agree it's a different process. Personally, I do not enjoy it. If I get code wrong or the solution I came up with is clunky, I am okay to start over. At least I learned something valuable. With Claude, I get irritated, frustrated, and frankly just really tired. I feel like I've been burning hour after hour of my precious time trying to explain something to a machine, which just doesn't understand, cannot understand, and what comes out as output from that process is just disappointing. I feel that I don't trust the code it produces and I don't have it in me to even read that code. I never felt that way about code written by me or another person. I will admit, that Claude has been helpful as an assistant (especially helping me with syntax I am not familiar with), but as a programmer that does things for me, it's been awful. YMMV. Btw. a week of doing that (treating Claude as a programmer who does things for me) did help me in a way. I now have an intuitive understanding of what it means these things are not intelligence. I am now certain, that an LLM doesn't understand anything. It seems to be able to map text to some representations and then see if these representations match or compose. I know this might sound like intelligence, but in practice it's just not enough. Pattern recognition, sure. Not intelligence. Not even close.

▲

qsera 3 hours ago | parent | prev | next [-]

>The scientific version of these claims is “the total encoding cost (for some class of tasks) is lower than previous models”

I wonder why? Can the new models read mind?

> For example, I was recently trying to install a package whose name I forgot. I prompted the model to “install that x11 fake gui thing”, a trivial prompt.

Yes, they are a better search.

I would also add that there is also a subjective factor. If I enjoy writing code a lot more than reviewing it, I am going to prefer NOT using it for writing and might just use it to review.

So "hardness" is also related to how much you like/dislike doing it.

▲

SOLAR_FIELDS 2 hours ago | parent [-]

It does feel like with each new frontier model release the major improvement I notice is that the model is, in fact, getting better at reading your mind. And what I mean by that is that it gets better at understanding the nuance and the subtleties of the intent of what you are saying better, and teasing out the actual intent of what you want better. So it gets easier and easier for the model to build a world around less input. So in a significant way, yes, newer models are reading your mind in a way, because they are probabilistically figuring out better how most humans communicate in natural language and filling in the gaps.

Re writing code: most people find the writing of code to be a chore. For those that don’t, I don’t envy them, because that is the part that just got completely destroyed by AI. It’s becoming pretty abundantly clear that if you enjoy hand writing code that it will be a hobby rather than something you can do professionally and succeed over people who aren’t writing by hand

	▲	andai 10 minutes ago \| parent [-]
		Yeah they have more "common sense", though not as much as I'd like. I used to think Opus is big, but after using it a lot, I think it should actually be a lot bigger. The difference from Sonnet to Opus is really noticeable, but the difference from Opus to human (in common sense) is also massive. I expect as the hardware improves, we'll see 3-10x bigger models become the default. Small models are making great strides of course, and perhaps we will soon learn to distill common sense ;) but subtlety and nuance appear physically bound to parameter count...

▲

adampunk 13 hours ago | parent | prev | next [-]

This is basically the right approached, framed as critique. Success with these models means engaging in detail with their work, persistently and at all scales. You need attention to detail, ability to evaluate (independent from the model), and mechanisms for enforcing all that. In a word: engineering.

But because people get all bent out of shape I prefer to call it vibe coding anyway.

▲

7777777phil 21 hours ago | parent | prev [-]

I particularly like this framework: how hard is it to describe the task vs. how hard is it to check the output.