Remix.run Logo
jungturk 2 hours ago

Setting aside that we're living in a universe that's full of (practically) deterministic processes built over probabilistic components (and which behave sufficiently reliably without any human in the loop), I think the specific failure mode you're citing is that there aren't enough gates and constraints applied to the processes you've seen.

LLMs can contribute quite reliably given very narrow prompts and short horizons (keeping turns low and context brief). If you chain a bunch of these narrow contributions together and define guardrails (structured outputs, online evals, other-llm-as-judge/jury, etc...) you can produce a very repeatable workflow that reliably delivers to defined service levels.

The obvious issue being - you've got to define the workflow and implement all the guardrails, not hope that the LLM will infer them during a session or a one-shot prompt.