| ▲ | simianwords 8 hours ago | |||||||
OT but my intuition says that there’s a spectrum - non thinking models - thinking models - best of N models like deep think an gpt pro Each one is of a certain computational complexity. Simplifying a bit, I think they map to - linear, quadratic and n^3 respectively. I think there are certain class of problems that can’t be solved without thinking because it necessarily involves writing in a scratchpad. And same for best of N which involves exploring. Two open questions 1) what’s the higher level here, is there a 4th option? 2) can a sufficiently large non thinking model perform the same as a smaller thinking? | ||||||||
| ▲ | futureshock 4 hours ago | parent | next [-] | |||||||
I think step 4 is the agent swarm. Manager model gets the prompt and spins up a swarm of looping subagents, maybe assigns them different approaches or subtasks, then reviews results, refines the context files and redeploys the swarm on a loop till the problem is solved or your credit card is declined. | ||||||||
| ||||||||
| ▲ | NitpickLawyer 7 hours ago | parent | prev | next [-] | |||||||
> best of N models like deep think an gpt pro Yeah, these are made possible largely by better use at high context lengths. You also need a step that gathers all the Ns and selects the best ideas / parts and compiles the final output. Goog have been SotA at useful long context for a while now (since 2.5 I'd say). Many others have come with "1M context", but their usefulness after 100k-200k is iffy. What's even more interesting than maj@n or best of n is pass@n. For a lot of applications youc an frame the question and search space such that pass@n is your success rate. Think security exploit finding. Or optimisation problems with quick checks (better algos, kernels, infra routing, etc). It doesn't matter how good your pass@1 or avg@n is, all you care is that you find more as you spend more time. Literally throwing money at the problem. | ||||||||
| ▲ | mnicky 7 hours ago | parent | prev [-] | |||||||
> can a sufficiently large non thinking model perform the same as a smaller thinking? Models from Anthropic have always been excellent at this. See e.g. https://imgur.com/a/EwW9H6q (top-left Opus 4.6 is without thinking). | ||||||||
| ||||||||