Remix.run Logo
pron 6 days ago

The experience was that I once asked an LLM to write a simple function and it produced something very wrong that nothing with good reasoning abilities should ever do. Of course, a drunk or very tired human could have done the same mistake, but they would have at least told me that they were impaired and unsure of their work.

I agree that most of the time it does most simple tasks mostly right, but that's not good enough to truly "offload" my mental effort. Again, I'm not saying it's not useful, but more than working with a junior developer it's like working with a junior developer who may or may not be drunk or tired and doesn't tell you.

But mostly my point is that LLMs seem to do logical reasoning worse than other things they do better, such as generating prose or summarising a document. Of course, even then you can't trust them yet.

> But then I do all that for all code anyway, including my own

I don't, at least not constantly. I review other people's code only towards the very end of a project, and in between I trust that they tell me about any pertinent challenge or insight, precisely so that I can focus on other things unless they draw my attention to something I need to think about.

I still think that working with a coding assistant is interesting and even exciting, but the experience of not being able to trust anything, for me at least, is unlike working with another person or with a tool and doesn't yet allow me to focus on other things. Maybe with more practice I could learn to work with something I can't trust at all.

darkerside 6 days ago | parent | next [-]

> working with a junior developer who may or may not be drunk or tired and doesn't tell you.

Bad news, friend.

Overall though, I think you're right. It's a lot like working with people. The things you might be missing are that you can get better at this with practice, and that once you are multiplexing multiple Claudes, you can become hyper efficient. These are things I'm looking into now.

Do I know these for a fact? Not yet. But, like any tool, I'm sure that the investment won't pay off right away.

kenjackson 6 days ago | parent | prev [-]

What was the simple function?

throwaway31131 6 days ago | parent | next [-]

I’m not sure what their simple function was but I tried to use Claude to recreate C++ code to implement the algorithms in this paper as practice for me in LLM use and it didn’t go well. But I’ll be the first to admit that I’m probably holding it wrong.

https://users.cs.duke.edu/~reif/paper/chen/graph/graph.pdf

pron 6 days ago | parent | prev [-]

Can't remember, but it was something very basic - a 10/15-line routine that a first-year student would write in 3 minutes if they knew the relevant API. The reason I asked the model in the first place is because I didn't know the API. If memory serves, the model inverted an if or a loop condition.

p1esk 6 days ago | parent | next [-]

Did you use one of the latest frontier reasoning models? If not, how is your experience relevant?

totallykvothe 6 days ago | parent | next [-]

In what world is this an appropriate thing to say to someone?

p1esk 5 days ago | parent | next [-]

In the world where you do not claim that LLMs suck today based on your attempt to use some shitty model three years ago.

guappa 6 days ago | parent | prev [-]

In the creed of "AI is perfect, if you claim otherwise you're broken" that so many here embrace.

5 days ago | parent | prev [-]
[deleted]
jama211 6 days ago | parent | prev [-]

So you tried it once and then gave up?

pron 6 days ago | parent [-]

I didn't give up, I just know that I can only use a model when I have the patience to work with something I can't trust at all on anything. So that's what I do.

jama211 5 days ago | parent [-]

Sounds like the spirit of my question remains intact