> Both conditions used GitHub Copilot (Claude Sonnet 4.5 or Haiku 4.5, depending on study) running in VS Code within isolated Docker containers. The only difference was Mouse tool availability. (https://hic-ai.com/papers/mouse-paper-v13.pdf)

Haiku/Sonnet 4.5 on GitHub Copilot is not a valid comparison whatsoever.

You need to benchmark against Claude Code running Opus. I mean, being revolutionary is a big claim to fame.

▲

handfuloflight 2 hours ago | parent [-]

I guess this is what is meant by AI psychosis?

▲

helloplanets 2 hours ago | parent [-]

Not at all. This looks just like someone trying to make a quick buck, hyping their product up with bad benchmarks.

	▲	handfuloflight 2 hours ago \| parent [-]
		You don't think there's some LLM behind the scenes deeply encouraging them to pursue this as revolutionary, worthy of patent, etc?