| ▲ | Stitch4223 6 hours ago | |||||||
It’s four poorly constructed arbitrary experiments which say very little about the competency of either model. The article reads like thin, auto-generated ai clickbait for nerd sniping or shilling a model. Consider the lead: > DeepSeek V4 Pro wins this head-to-head by being more exact where it matters: following instructions, matching schemas, and solving edge cases cleanly. GPT-5.5 Pro is still strong, but it gave away points with avoidable deviations. “where it matters”, “cleanly”, “is still strong”, and vague references instead of telling 3 out of 4 tests Deepseek yielded more concise results. 1 star. | ||||||||
| ▲ | monooso 2 minutes ago | parent | next [-] | |||||||
I think you've misunderstood the purpose of a lead (sic). Per Merriam-Webster [^1], a lede is: > the introductory section of a news story that is intended to entice the reader to read the full story (Emphasis mine) You may prefer more matter-of-fact phrasing, of course, but criticising a lede for attempting to achieve its goal is unjustified. | ||||||||
| ▲ | jampekka an hour ago | parent | prev [-] | |||||||
(Three out of) four experiments is anecdotal for sure, but the result meshes with more established instruction following benchmarking (although DeepSeek V4 pro does not top these): https://artificialanalysis.ai/evaluations/ifbench I found the writing clear and quite even handed. The lead is a bit salesy, but leads typically are. Knee-jerk dismissals based on vibes that something is LLM generated are quite low-effort. | ||||||||
| ||||||||