Remix.run Logo
iamleppert 10 hours ago

Meanwhile, in the real world, as a software developer who uses every possible AI coding agent I can get my hands on, I still have to watch it like a hawk. The problem is one of trust. There are some things it does well, but its often times impossible to tell when it will make some mistake. So you have to treat every piece of code produced as suspect and with skepticism. If I could have automated my job by now and been on a beach, I would have done it. Instead of writing code by hand, I now largely converse with LLMs, but I still have to be present and watching them and verifying their outputs.

eknkc 10 hours ago | parent | next [-]

Yeah but just look at what happened within the last 2 years. I was not convinced about the AI revolution but I bet in another 2 years, we won't be looking at the output..

polotics 10 hours ago | parent | next [-]

Not so sure, there are indiosyncracies now within the various models, I suspect all this is the result of RLHF, and they cause side.effects. I'm not sure that more attention-is-all-you-need is necessarily going to give us another step change, maybe more general intelligence, but not more focus. Possibly also we soon end up with grokked AI's on all side: pushing their agenda whatever you asked... Gemini: "no this won't work with Cloudflare, I created your GCP account, there you go" OpenAI: "I am certain you really wanted me to do all these other tasks and I have done them, you should upgrade your tokens plan" etc (you know how to fill in for DeepSeek and Grok already, right)

falloutx 9 hours ago | parent | prev [-]

Tech can always hit a plateau, here's to hoping anthropic & openAI run out of money.

headcanon 10 hours ago | parent | prev [-]

I've been coming around to the view that the time spent code-reviewing LLM output is better spent creating evaluation/testing rigs for the product you are building. If you're able to highlight errors in tests (unit, e2e, etc.) and send the detailed error back to the LLM, it will generally do a pretty good job of correcting itself. Its a hill-climbing system, you just have to build the hill.