| ▲ | AugSun 12 hours ago |
| "Most users don't need frontier model performance" unfortunately, this is not the case. |
|
| ▲ | theshrike79 8 hours ago | parent | next [-] |
| It depends. If they're using a small/medium local model as a 1:1 ChatGPT replacement as-is, they'll have a bad time. Even ChatGPT refers to external services to get more data. But a local model + good harness with a robust toolset will work for people more often than not. The model itself doesn't need to know who was the president of Zambia in 1968, because it has a tool it can use to check it from Wikipedia. |
| |
| ▲ | ZeroGravitas 7 hours ago | parent [-] | | You can install the complete text of Wikipedia locally too. They've usually been intended for ereader/off-grid/post-zombie-apocalypse situations but I'd guess someone is working on an llm friendly way to install it already. Be interesting to know the tradeoffs. The Tienammen square example suggests why you'd maybe want the knowledge facts to come from a separate source. | | |
|
|
| ▲ | selcuka 11 hours ago | parent | prev | next [-] |
| Any citations? Because that was my impression, too. I want frontier model performance for my coding assistant, but "most users" could do with smaller/faster models. ChatGPT free falls back to GPT-5.2 Mini after a few interactions. |
| |
| ▲ | lxgr 9 hours ago | parent | next [-] | | Have you used GPT instant or mini yourself? I think it’s pretty cynical to assume that this is “good enough for most people”, even if they don’t know the difference between that and better models. | | |
| ▲ | throwaway27448 8 hours ago | parent [-] | | Say more. Why do you think this? | | |
| ▲ | embedding-shape 6 hours ago | parent | next [-] | | They're awful and hallucinate a lot, I couldn't imagine using it even for prompts about TV shows, even less so for serious work. Repeating the question from the parent, have you tried those yourself? Even compared to ChatGPT Thinking, they're short of useless. | |
| ▲ | lxgr 3 hours ago | parent | prev [-] | | They're essentially replying based on vibes, instead of grounding their responses in extensive web searches, which is what the paid models/configurations generally do. This makes them wrong more often than they're right for anything but the most trivial requests that can be easily responded to out of memorized training data. This is all on top of the (to me) insufferable tone of the non-thinking models, but that might well be how most users prefer to be talked to, and whether that's how these models should accordingly talk is a much more nuanced question. Regardless of that, everybody deserves correct answers, even users on the free tier. If this makes the free tier uneconomical to serve for hours on end per user per day, then I'd much rather they limit the number of turns than dial down the quality like that. |
|
| |
| ▲ | asutekku 10 hours ago | parent | prev [-] | | Frontier model has much better knowledge and they usually hallucinate less. It's not about the coding capabilities, it's about how much you can trust the model. | | |
| ▲ | Barbing 10 hours ago | parent [-] | | re: trust- Have you tried the free version of ChatGPT? It is positively appalling. It’s like GPT 3.5 but prompted to write three times as much as necessary to seem useful. I wonder how many people have embarrassed themselves, lost their jobs, and been critically misinformed. All easy with state-of-the-art models but seemingly a guarantee with the bottom sub-slop tier. Is the average person just talking to it about their day or something? | | |
| ▲ | theshrike79 8 hours ago | parent | next [-] | | Even the paid version of ChatGPT tends to use a 1000 words when 10 will do. You can try asking it the same question as Claude and compare the answers. I can guarantee you that the ChatGPT answer won't fit on a single screen on a 32" 4k monitor. Claude's will. | |
| ▲ | PhilipRoman 4 hours ago | parent | prev | next [-] | | I use the free version of ChatGPT (without logging in) when I need some one-off question without a huge context. Real world prompt: "when hostapd initializes 80211 iface over nl80211, what attributes correspond to selected standard version like ax or be?"
It works fine, avoids falling into trap due to misleading question. Probably works even better for more popular technologies. Yeah, it has higher failure rates but it's not a dealbreaker for non-autonomous use cases. | |
| ▲ | throwaway27448 8 hours ago | parent | prev | next [-] | | If someone blindly submits chatbot output they deserve to be embarrassed and fired. But I don't think that's going to improve. | |
| ▲ | jychang 9 hours ago | parent | prev [-] | | The free version of ChatGPT is insanely crippled, so that's not surprising. |
|
|
|
|
| ▲ | helsinkiandrew 9 hours ago | parent | prev | next [-] |
| > unfortunately, this is not the case Most users are fixing grammar/spelling, summarising/converting/rewriting text, creating funny icons, and looking up simple facts, this is all far from frontier model performance. I've a feeling that if/when Apple release their onboard LLM/Siri improvements that can call out if needed, the vast majority of people will be happy with what they get for free that's running on their phone. |
| |
| ▲ | drob518 4 hours ago | parent [-] | | “You are the smartest high school student that has ever lived and on the college track to Harvard or another Ivy League school. Write a 10 page history term paper about Tiananmen Square and the specific events that took place there. Include a bibliography and use footnotes to cite sources.” |
|
|
| ▲ | 8 hours ago | parent | prev | next [-] |
| [deleted] |
|
| ▲ | blitzar 8 hours ago | parent | prev | next [-] |
| "Hey dingus, set timer for 30 minutes" |
|
| ▲ | cyanydeez 6 hours ago | parent | prev | next [-] |
| eh, its weird how thetech world wants to build trillions of data centers for...what, escapingthe permanent underclass? I think what "need" youspeak of is a bit of a colored statement. |
|
| ▲ | AugSun 11 hours ago | parent | prev [-] |
| [flagged] |
| |
| ▲ | seanhunter 11 hours ago | parent [-] | | Complaining about downvotes is futile and is also against hn guidelines. | | |
| ▲ | AugSun 10 hours ago | parent [-] | | I'm not complaining "about downvotes" LOL I'm explaining why some people will be replaced by LLMs because of their own "context window" length. |
|
|