▲ | aunty_helen 5 days ago | |||||||||||||
You split up the tasks into sub agents. This is something my company builds on top of langgraph. | ||||||||||||||
▲ | potatolicious 5 days ago | parent [-] | |||||||||||||
Sure, go try it and evaluate it rigorously end-to-end, over a sufficient number and variety of tools. For the purposes of the exercise, let's conservatively say, maybe ~2000 tools covering ~100 major verticals of use cases. Even that may be too narrow for a true general purpose assistant, but it's at least a good start. You can slice the sub-agents however you'd like. If you can get recall, for real user utterances (not contrived eval utterances authored by your devs and MLEs), over 70% across all the verticals/use cases/tool uses, I'd be extremely impressed. Heck, my thoughts on this won't matter - if you can get the recall for such a system over the bar you'd have cracked something nobody else has and should actively try to sell it to Google for nine figures. | ||||||||||||||
|