Remix.run Logo
kkukshtel 5 hours ago

"I make AI output lots of stuff" is not an intrinsically valuable thing. I can run the same thing on Claude in research mode and get a report with cited sources in a more digestable format on my phone. What's the eval here on if any of this is good? Is it even possible to test (ie, you cant really AB test startup ideas)?

a24venka 5 hours ago | parent [-]

Great question. The core of Spine is coordinating multiple specialized agents across multiple models, using the canvas to store and pass context selectively so each agent works with exactly what it needs.

On the eval side, we ran Spine Swarm against GAIA Level 3 and Google DeepMind's DeepSearchQA and hit #1 on both.Full writeup: https://blog.getspine.ai/spine-swarm-hits-1-on-gaia-level-3-...