Remix.run Logo
solarkraft 5 hours ago

Smells like hyperbole. A lot of people making such claims don’t seem to have continued real world experience with these models or seem to have very weird standards for what they consider usable.

Up until relatively recently, while people had already long been making these claims, it came with the asterisks of „oh, but you can’t practically use more than a few K tokens of context“.

derekp7 4 hours ago | parent | next [-]

"Create a single page web app scientific RPN calculator"

Qwen 3.5 122b/a10b (at q3 using unsloth's dynamic quant) is so far the first model I've tried locally that gets a really usable RPN calculator app. Other models (even larger ones that I can run on my Strix Halo box) tend to either not implement the stack right, have non-functional operation buttons, or most commonly the keypad looks like a Picasso painting (i.e., the 10-key pad portion has buttons missing or mapped all over the keypad area).

This seems like such as simple test, but I even just tried it in chatgpt (whatever model they serve up when you don't log in), and it didn't even have any numerical input buttons. Claude Sonet 4.6 did get it correct too, but that is the only other model I've used that gets this question right.

airstrike an hour ago | parent | next [-]

is your prompt literally 1-sentence?

if so, a better approach would be to ask it to first plan that entire task and give it some specific guidance

then once it has the plan, ask it to execute it, preferably by letting it call other subagents that take care of different phases of the implementation while the main loop just merges those worktrees back

it's how you should be using claude code too, btw

nl 10 minutes ago | parent [-]

Claude Sonnet can easily one-shot that without specifically asking for plan first.

rienko 4 hours ago | parent | prev [-]

We tend to find Qwen3-Coder-Next better at coding at least on our anecdotal examples from our codebases. It's somewhat better at tool calling, maybe the current templates for Qwen3.5 are still not enjoying as "mature" support as Qwen3 on vllm. I can say in my team MiniMax2.5 is the currently favorite.

tempest_ 5 hours ago | parent | prev | next [-]

Qwen3-Coder-30B-A3B-Instruct is good I think for in line IDE integration or operating on small functions or library code but I dont think you will get too far with one shot feature implementation that people are currently doing with Claude or whatever.

andy_ppp 4 hours ago | parent | next [-]

I have been adding a one shot feature to a codebase with ChatGPT 5.3 Codex in Cursor and it worked out of the box but then I realised everything it had done was super weird and it didn't work under a load of edge cases. I've tried being super clear about how to fix it but the model is lost. This was not a complex feature at all so hopefully I'm employed for a few more years yet.

rubyn00bie 4 hours ago | parent | prev [-]

I could be doing something wrong, but I have not had any success with one shot feature implementations for any of the current models. There are always weird quirks, undesired behaviors, bad practices, or just egregiously broken implementations. A week or so ago, I had instructed Claude to do something at compile-time and it instead burned a phenomenal amount of tokens before yeeting the most absurd, and convoluted, runtime implementation—- that didn’t even work. At work I use it (or Codex) for specific tasks, delegating specific steps of the feature implementation.

The more I use the cloud based frontier models, the more virtue I find in using local, open source/weights, models because they tend to create much simpler code. They require more direct interaction from me, but the end result tends to be less buggy, easier to refactor/clean up, and more precisely what I wanted. I am personally excited to try this new model out here shortly on my 5090. If read the article correctly, it sounds like even the quantized versions have a “million”[1] token context window.

And to note, I’m sure I could use the same interaction loop for Claude or GPT, but the local models are free (minus the power) to run.

[1] I’m a dubious it won’t shite itself at even 50% of that. But even 250k would be amazing for a local model when I “only” have 32GB of VRAM.

__mharrison__ 4 hours ago | parent | prev [-]

I used the 35b model to create a polars implementation of PCA (no sklearn or imports other than math and polars). In less than 10 minutes I had the code. This is impressive to me considering how poorly all models were with polars until very recently. (They always hallucinated pandas code.)