Remix.run Logo
w10-1 2 hours ago

Human delegation is disciplined as much by incentive alignment as instruction. The same is true for LLM's. The problem is that it's not possible to dominate intentions, LLM or human, because delegates/agents to be useful need autonomy.

The SOTA models are working on making them more capable and then adding guardrails for safety. It would be better to work on baking in incentive alignment, which probably means eliciting more incentive details from the LLM user. That's what I'd be working on at Apple, where the user might be induced to share a level of local-only details that could align the AI agents.