Remix.run Logo
varispeed 6 hours ago

I don't know, I've been using Mythos this week quite sceptically and I found it to be incredibly dumb. For instance gave it a dialogue between 3 people and it was constantly mixing up who said what to whom, which looked like early Gemini behaviour. But latest Opus does that too. It would also make nonsensical inference about given papers and only correct itself when pointed out what it said wrong. If that is what US government fears... maybe the fear is that someone follows the dumb things the model suggests.

zmmmmm 6 hours ago | parent | next [-]

it feels like it's mostly just tuned to up it's level of capability on long horizon tasks - stop context rot and keep persisting at all costs until a goal is done.

The base intelligence does not feel much greater to me.

hodgehog11 4 hours ago | parent | prev [-]

This is a ridiculous thing to test on it. Other models are trained on that kind of thing, use those instead.

Fable was designed for _really_ hard software engineering problems. Possibly large, but especially hard. For those tasks, you feel the difference immediately.