Remix.run Logo
keeda 17 hours ago

People can't be trusted to do anything either, which is why we have guardrails and checks and balances and audits. That is why in software, for instance, we have code reviews and tests and monitoring and other best practices. That is probably also why LLMs have made the most headway in software development; we already know how to deal with unreliable workers that are humans and we can simply transfer that knowledge over.

As was discussed on a subthread on HN a few weeks ago, the key to developing successful LLM applications is going to be figuring out how to put in the necessary business-specific guardrails with a fallback to a human-in-the-loop.

lmm 17 hours ago | parent [-]

> People can't be trusted to do anything either, which is why we have guardrails and checks and balances and audits. That is why in software, for instance, we have code reviews and tests and monitoring and other best practices. That is probably also why LLMs have made the most headway in software development; we already know how to deal with unreliable workers that are humans and we can simply transfer that knowledge over.

The difference is that humans eventually learn. We accept that someone who joins a team will be net-negative for the first few days, weeks, or even months. If they keep making the same mistakes that were picked out in their first code review, as LLMs do, eventually we fire them.

keeda 17 hours ago | parent [-]

LLMs may not learn on the fly (yet), but these days they do have some sort of a memory that they automatically bring into their context. It's probably just a summary that's loaded into its context, but I've had dozens of conversations with ChatGPT over the years and it remembers my past discussions, interests and preferences. It has many times connected dots across conversations many months apart to intuit what I had in mind and proactively steered the discussion to where I wanted it to go.

Worst case, if they don't do this automatically, you can simply "teach" them by updating the prompt to watch for a specific mistake (similar to how we often add a test when we catch a bug.)

But it need not even be that cumbersome. Even weaker models do surprisingly well with broad guidelines. Case in point: https://news.ycombinator.com/item?id=42150769

yahoozoo 3 hours ago | parent [-]

Yeah, the memory feature is just a summary of past conversations added to the system prompt.