| ▲ | eleventyseven 4 hours ago | |
> Throughout this series, “we” refers to maderix (human) and Claude Opus 4.6 (by Anthropic) working as a pair. The reverse engineering, benchmarking, and training code were developed collaboratively Sure, "collaboratively." Why would I ever trust a vibe coded analysis? How do I, a non expert in this niche, know that Opus isn't pulling a fast one on both of us? LLMs write convincing bullshit that even fools experts. Have you manually verified each fact in this piece? I doubt it. Thanks for the disclaimer, it saved me from having to read it. | ||
| ▲ | Anonbrit 3 hours ago | parent | next [-] | |
Humans also write endless amounts of convincing bullshit, and have done since time immemorial. False papers and faked results have been a growing scourge in academia before LLMs were a thing, and that's just counting the intentional fraud - the reproducibility crisis in science, especially medical and psychological science, affects even the best designed and well intentioned of studies. Humans also make mistakes and assumptions while reverse engineering, so it will always need more engineers to go through the results, test things | ||
| ▲ | withinboredom 4 hours ago | parent | prev [-] | |
Claude likes to hide bad benchmarks from you, so it will show you where you are clearly winning. You even see some weird benchmarks in the article. | ||