Remix.run Logo
wordofx 14 hours ago

I still haven’t found anyone who AI wouldn’t be helpful or that isn’t trustworthy enough. People make the /claim/ it’s not useful or they are better without it. When you sit down with them it often turns out they just don’t know how to use AI effectively.

RamblingCTO 14 hours ago | parent | next [-]

No, AI is just garbage. I asked AI a clear cut question about battery optimization in zen. It told me it's based on chrome, but it's based on firefox.

Ask it about a torque spec for your car? Yup, wrong. Ask it to provide sources? Less wrong but still wrong. It told me my viscous fan has a different thread than it has. Would I have listened, I would've shredded my thread.

My car is old, well documented and widely distributed.

Doesn't matter if claude or chatgpt. Don't get me started on code. I care about things being correct and right.

UncleMeat 9 hours ago | parent | prev | next [-]

A couple weeks ago I was working on a parser that needed to handle a new file format that was a large change from existing formats. I wanted some test inputs, both valid and invalid cases. I had the codebase of a tool chain that I knew could generate valid files, some public documentation about the new file format, and my parser codebase.

A good problem to throw at AI, I thought. I handed the tools to a SOTA model and asked it to generate me some files. Garbage. Some edits to the prompts and I still get garbage. Okay, that's pretty hard to generate a binary with complex internal structure directly. Let's ask it to tell me how to make the toolchain generate these for me. It gives me back all sorts of CLI examples. None work. I keep telling it what output I am getting and how it differs from what I want. Over and over it fails.

I finally reach out to somebody on the toolchain team and they tell me how to do it. Great, now I can generate some valid files. Let's try to generate some invalid ones to test error paths. I've got a file. I've got the spec. I ask the LLM to modify the file to break the spec in a single way each time and tell me which part of the spec it broke each time. Doesn't work. Okay. I ask it to write me a python program that does this. Works a little bit, but not consistently and I need to inspect each output carefully. Finally I throw my files into a coverage guided fuzzer corpus and over a short period of time it's generated inputs that have excellent branch coverage for me.

What would effective have looked like to you in this situation?

wordofx 9 minutes ago | parent [-]

Edit: also thanks for actually replying with a great comment cos usually the replies are not even worth entertaining and only proving my point.

This is the classic example of “I want it to do everything. But it didn’t do what I wanted. So obviously it’s not helpful.”

It doesn’t solve /all/ problems. And some models are better than others at certain tasks. You see people talk about PRDs and they say “I got Claude to create a PRD and it sucked” but you sit them down with o3 and generate the same PRD and they are like “oh wow this is actually pretty decent”.

But difficult to help over HN. As a sort of related example. Back in May I had a requirement to ingest files from a new POS system we didn’t currently support. The exports we got are CSV but based on the first char decides the first type of line that needs to be parsed and how many commas will be in that data set line.

I used o3 and explained everything I could have how the file worked and how it should parse it into a generic list etc. got it to generate a basic PRD with steps and assertions along the way.

I then fed this into cursor using Claude sonnet 4 with the csv files asking it to look at the files and the PRD and asked it if there was anything that didn’t make sense. Then asked it to begin implementing the steps 1 by 1 and letting me check before moving onto the next step. Couple of times it misunderstood and did things slightly wrong but I just corrected it or asked Claude to correct it. But it essentially wrote code. Wrote test. Verified. I verified. It moved on.

Typically in the past these tasks take a few days to implement and roll out. This whole thing took about 1 hour. The code is in the style of the other parsers with the exception that it optimised some parts and despite being a bit more complicated runs faster than some of our older parsers.

While most of my usage is around programming. We also use AI for marketing, customer leads, I have scheduled tasks to give me summaries of tickets from the support system to know what should be prioritised for the day. So even tho AI doesn’t solve all programming issues we get value in almost all aspects of the business.

hansvm 13 hours ago | parent | prev | next [-]

I'll pick a few concrete tasks: Building a substantially faster protobuf parser, building a differentiable database, and building a protobuf pre-compression library. So far, AI's abilities have been:

1. Piss-poor at the brainstorming and planning phase. For the compression thing I got one halfway decent idea, and it's one I already planned on using.

2. Even worse at generating a usable project structure or high-level API/skeleton. The code is unusable because it's not just subtly wrong; it doesn't match any cohesive mental model, meaning the first step is building that model and then figuring out how to ram-rod that solution into your model.

3. Really not great at generating APIs/skeletons matching your mental model. The context is too large, and performance drops.

4. Terrible at filling in the details for any particular method. It'll have subtle mistakes like handling carryover data at the end of a loop, but handling it always instead of just when it hasn't already been handled. Everything type checks, and if it doesn't then I can't rely on the AI to give a correct result instead of the easiest way to silence the compiler.

5. Very bad at incorporating invariants (lifetimes, allocation patterns, etc) into its code when I ask it to make even minor tweaks, even when explicitly promoted to consider such-and-such edge case.

6. Blatantly wrong when suggesting code improvements, usually breaking things, and in a way you can't easily paper over the issue to create something working "from" the AI code.

Etc. It just wasn't well suited to any of those tasks. On my end, the real work is deeply understanding the problem, deriving the only possible conclusions, banging that into code, and then doing a pass or three cleaning up the semicolon orgasm from the page. AI is sometimes helpful in that last phase, but I'm certain it's not useful for the rest yet.

My current view is that the difference in viewpoints stems from a combination of the tasks being completed (certain boilerplate automation crap I've definitely leaned into AI to handle, maybe that's all some devs work on?) and current skill progression (I've interviewed enough people to know that the work I'm describing as trivial doesn't come naturally to everyone yet, so it's tempting to say that it's you holding your compiler wrong rather than me holding the AI wrong).

Am I wrong? Should AI be able to help with those things? Is it more than a ~5% boost?

lazide 14 hours ago | parent | prev | next [-]

Personally, everyone I’ve seen using AI either clearly didn’t understand what they were doing (in a ‘that’s not doing what you think it’s doing’ perspective), often in a way that was producing good sounding garbage, or ended up rewriting almost all of it anyway to get the output they actually wanted.

At this point I literally spend 90% of my time fixing other teams AI ‘issues’ at a fortune 50.

wordofx 11 hours ago | parent | prev [-]

Once again. Replies only proving me right. Desperately trying to justify “ai bad I’m superior” mentality.

acdha 8 hours ago | parent [-]

This is pure trolling when you are unable to engage with the comments or provide evidence supporting your position.

elktown 7 hours ago | parent | next [-]

The comment history [0] suggests that it isn't trolling, but fanaticism - with the telltale hope of demise for everyone that can't see the light.

If I digress a bit, I wonder what it is with some hypes - tech or just broader in society - where something reaches a critical mass of publicity where suddenly a surprisingly large portion of people becomes just convinced, regardless of if they have some actual knowledge of the subject or not. Like, anecdotally, watching completely unrelated sports streams and there's a really sincere off-topic comment that AI will change everything now and "society isn't ready" or similar. But I guess it shouldn't be surprising when supposedly intelligent tech gurus and media folks are just eating everything up and starts peddling the narrative to the general public, why shouldn't one believe it? It's like a confluence of incentives that becomes a positive feedback loop until it all blows up.

[0]: https://news.ycombinator.com/item?id=44163596

acdha 6 hours ago | parent [-]

I guess I was thinking of it as trolling inspired by fandom. It’s cool to like something, but no positive discussion will grow out of making a provocative statement and then saying anyone who disagrees is proof you’re right.

elktown 5 hours ago | parent [-]

Yep. The end result is certainly the same either way.

wordofx an hour ago | parent | prev [-]

This is HN. Even with evidence people here get their feelings hurt and downvote. You cannot engage in conversation when one side is “the science is settled” mentality.

or in the case of AI, the majority of people are “wow this is useful and helpful” and then HN is like “it didn’t 1 shot answer from the first prompt I gave it so it’s useless”.

You only need to read every discussion on HN about AI. The majority here are against AI are also the same people who really have no idea how to use it.

So in the end there is no conversation to engage with because HN has tunnel vision. “Elon bad” “ai bad” “science bad” “anything not aligned with my political view bad”