| ▲ | utopiah 3 hours ago | |
> academic papers take 6-12 months to come out, by which time the LLM space has often moved on by an entire model generation. This is a recurring argument which I don't understand. Doesn't it simply mean that whatever conclusion they did was valid then? The research process is about approximating a better description of a phenomenon to understand it. It's not about providing a definitive answer. Being "an entire model generation" behind would be important if fundamental problems, e.g. no more hallucinations, would be solved but if it's going from incremental changes then most likely the conclusions remain correct. Which fundamental change (I don't think labeling newer models as "better" is sufficient) do you believe invalidate their conclusions in this specific context? | ||
| ▲ | soulofmischief 34 minutes ago | parent [-] | |
2025 has been a wild year for agentic coding models. Cutting-edge models in January 2025 don't hold a candle to cutting edge models in December 2025. Just the jump from Sonnet 3.5 to 3.7 to 4.5, and Opus 4.5 has been pretty massive in terms of holistic reasoning, deep knowledge as well as better procedural and architectural adherence. GPT-5 Pro convinced me to pay $200/mo for an OpenAI subscription. Regular 5.2 models, and 5.2 codex, are leagues better than GPT-4 when it comes to solving problems procedurally, using tools, and deep discussion of scientific, mathematic, philosophical and engineering problems. Models have increasingly longer context, especially some Google models. OpenAI has released very good image models, and great editing-focused image models in general have been released. Predictably better multimodal inference over the short term is unlocking many cool near-term possibilities. Additionally, we have seen some incredible open source and open weight models released this year. Some fully commercially viable without restriction. And more and more smaller TTS/STT projects are in active development, with a few notable releases this year. Honestly, the landscape at the end of the year is impressive. There has been great work all over the place, almost too much to keep up with. I'm very interested in the Genie models and a few others. For an idea: At the beginning of the year, I was mildly successful getting at coding models to make changes in some of my codebases, but the more esoteric problems were out of reach. Progress in general was deliberate and required a lot of manual intervention. By comparison, in the last week I've prototyped six applications at levels that would take me days to weeks individually, often developing multiple at the same time, monitoring agentic workflows and intervening only when necessary, relying on long preproduction phases with architectural discussions and development of documentation, requirements, SDDs... and detailed code review and refactoring processes to ensure adherence to constraints. I'm morphing from a very busy solo developer into a very busy product manager. | ||