Remix.run Logo
WithinReason 2 hours ago

> [...] beating Claude Code (32%) at roughly $0.17 per vulnerability found

Claude Code is an agent harness, not an LLM.

Claude is a brand (or group of LLMs), not an LLM.

raincole an hour ago | parent | next [-]

Yes, and the article author is fully aware of that. Thank you for pointing out this small mistake though.

mkagenius 24 minutes ago | parent [-]

It looks like the author is specifically avoiding model's name, because results are really weird.

  Opus 4.8/4.7 scored 28%

  Opus 4.6 score 37%

So the author thought as let's not get into that just write Claude.
andriy_koval 16 minutes ago | parent [-]

many people think opus 4.6 was the best

tills13 39 minutes ago | parent | prev | next [-]

It costs nothing to not be pedantic.

Onavo an hour ago | parent | prev [-]

Claude code it's the only way to get access to the actual amortized cost of running a Claude-scale model. The consumer non-enterprise API is extremely expensive (with increasing marginal costs for the user and fat profit margins for Anthropic). If you want to approximate a State level attacker's cost where they can have the model on their own hardware, Claude Code is probably the best guess at the amortized cost.