| ▲ | nl 4 hours ago | |
In practice I don't think any harness (happy to be corrected here!) uses the lesser capability models for writing code. The cost trade-offs are rarely worth it. They are often used for reading code though. To expand on this, while the "big model to write a plan, small model to write the specific code" idea is quite common it trips up on edge cases. In theory the flow works like this: - small fast models read lots of code, and pass details to the large model to write a plan - large model takes those details and writes a detailed plan - medium models write the code The issue happens when the medium model hits something that the plan didn't take into account (which happens a lot - the big model didn't actually read the code). Then it has to either guess, or pass back to the large model. If it guesses, the plan usually starts to fall to bits. If it passes back to the large model, inevitable the large model has to start reading lots of code. In that case you are paying the expensive tokens to read so you might as well have it write the code too (many less tokens are written than are read) It might be possible to get this to work, but I haven't seen anyone who has tried agentic work with frontier models be satisfied with this hybrid setup. I'd note that Amp (mentioned above) is probably the leader in using multiple providers in a coding agent but still uses frontier models to write code. | ||
| ▲ | sanderjd an hour ago | parent [-] | |
Great info, thanks! | ||