There is a significant gap between agents and models.

Agents use multiple models, can interact with the environment, and take many steps. You can get them to reflect on what they have done and what they need to do to continue, without intervention. One of the more important things they can do is understand their environment, the libraries and versions in use, fetch or read the docs, and then base their edits on those. Much of the hallucinating SDKs can be removed with this, and with running compile to validate, they get even better.

Models typically operate in a turn-by-turn basis with only the context and messages the user provides.

▲

th0ma5 5 days ago | parent [-]

You can't make any guarantees and manually watching everything is not tenable. "Much" instead of "all" means having to check it all because "much" is random.

	▲	verdverm 5 days ago \| parent [-]
		You don't have to watch it like you don't have to watch your peers. We have code review processes in place already You're never going to get all, you don't have all today. Humans make mistakes too and have to run programs to discover their errors