Remix.run Logo
ryanjshaw 4 hours ago

Until a year ago I believed as the author did. Then LLMs got to the point where they sit in meetings like I do, make notes like I do, have a memory like I do, and their context window is expanding.

Only issue I saw after a month of building something complex from scratch with Opus 4.6 is poor adherence to high-level design principles and consistency. This can be solved with expert guardrails, I believe.

It won’t be long before AI employees are going to join daily standup and deliver work alongside the team with other users in the org not even realizing or caring that it’s an AI “staff member”.

It won’t be much longer after that when they will start to tech lead those same teams.

symfrog an hour ago | parent | next [-]

The closer you get to releasing software, the less useful LLMs become. They tend to go into loops of 'Fixed it!' without having fixed anything.

In my opinion, attempting to hold the hand of the LLM via prompts in English for the 'last mile' to production ready code runs into the fundamental problem of ambiguity of natural languages.

From my experience, those developers that believe LLMs are good enough for production are either building systems that are not critical (e.g. 80% is correct enough), or they do not have the experience to be able to detect how LLM generated code would fail in production beyond the 'happy path'.

empath75 an hour ago | parent [-]

This is not my experience with claude code. It does forget big picture things but if you scope your changes well it’s fine.

symfrog an hour ago | parent | next [-]

I would estimate that out of every 200 lines of code that Claude Code produces, I notice at least 1 issue that would cause severe problems in production.

In my opinion these discussions should include MREs (minimal reproducible examples) in the form of prompts to ground the discussion.

For example, take this prompt and put it into Claude Code, can you see the problematic ways it is handling transactions?

---

The invoicing system is being merged into the core system that uses Postgres as its database. The core system has a table for users with columns user_id, username, creation_date . The invoicing data is available in a json file with columns user_id, invoice_id, amount, description.

The data is too big to fit in memory.

Your role is to create a Python program that creates a table for the invoices in Postgres and then inserts the data from the json file. Users will be accessing the system while the invoices are being inserted.

---

ajshahH 16 minutes ago | parent | prev [-]

Yes, but knowing how to scope your changes requires a lot of expertise.

Roark66 3 hours ago | parent | prev | next [-]

After 2 years of using all of these tools (Claude C, Gemini cli, opencode with all models available) I can tell you it is a huge enabler, but you have to provide these "expert guardrails" by monitoring every single deliverable.

For someone who is able to design an end to end system by themselves these tools offer a big time saving, but they come with dangers too.

Yesterday I had a mid dev in my team proudly present a Web tool he "wrote" in python (to be run on local host) that runs kubectl in the background and presents things like versions of images running in various namespaces etc. It looked very slick, I can already imagine the product managers asking for it to be put on the network.

So what's the problem? For one, no threading whatsoever, no auth, all queries run in a single thread and on and on. A maintenance nightmare waiting to happen. That is a risk of a person that knows something, but not enough building tools by themselves.

ryanjshaw 2 hours ago | parent [-]

Yup. I’m not expert so maybe I’m completely off base, but if I were OpenAI or Anthropic I’d likely just hire 1000 highly skilled engineers across multiple disciplines, tell them to build something in their domain of expertise, then critique the model’s output, iteratively work on guardrails for a month or two until the model one-shots the problem, and package that into the new release.

LiamPowell an hour ago | parent [-]

That's exactly what they are doing via dataannotation.tech and other services.

bakugo 2 hours ago | parent | prev [-]

I've been hearing this for several years. How much longer is "it won't be long"?