Recently I had switched to OpenCode to try out many of the Non-US-Frontier-Labs models. My unexpected favorite model to use was Mercury (a diffusion model). Not because it was “smart” but because it was stupid fast. It was more of a pair-programming experience instead of the SOTA agentic experience of prompting and waiting. Honestly, it was also way more fun and brought back some of the pre-AI coding experience while still getting some benefits of AI. It felt less of a slot machine where you prompt, wait, and hope it went in the right direction. It made me even use the tiny models like Gemini Flash Lite and GPT Mini/Nano more too.

Anyways, so excited for an open-weight model and I hope it performs well. I’ll be testing this ASAP.

▲

onlyrealcuzzo an hour ago | parent | next [-]

If you can run your tests fast and cheaply, and have metrics that show what bad/sloppy code is that are cheap & fast to generate, a worse fast model can outperform a far better far slower model if you value time...

I've had pretty good success with LLMs after putting in place metrics to measure true complexity (not cyclomatic), and automatically pushing back everything until the added complexity is within reason for the feature.

	▲	Daishiman 3 minutes ago \| parent [-]
		What metrics have you found useful?

▲

yeodev an hour ago | parent | prev | next [-]

I wonder how much this will impact locally used models for coding. I can imagine using diffusion models that are x-times faster than Qwen or Gemma 4 - where I have to do more "pre-ai" work which is a good thing and can have a very fast, very cheap model to work with locally. I assume since it doesn't do heavy computing for a long time that it's cheaper to run on local hardware as well?

▲

skybrian an hour ago | parent | prev | next [-]

Could you say more about how you use it? What does your workflow look like?

	▲	vineyardmike 32 minutes ago \| parent [-]
		Imagine you’re entirely pre-AI… to do some work, you read code, think, then write some code across a number of files. Usually then a small dance with compilation/unit tests to address anything broken. Along the way, you use your human judgement on style and quality, and midway through your change you might refactor something based on learned best practices (eg, when to use a static method, or helper class). Today, even the dumbest AI agents can trivially loop through the final dance to get compilation, and often unit tests (depending on scope of failure). Big SOTA agents have OK code quality, but if left unattended or unchecked will still generate pretty sloppy repos after a while. That’s true even when using models like Opus which is ridiculously expensive in comparison. When using the models in this fast “pair programming” style, I find that I (the human) mostly do all the “plan and think” work, and usually point the smaller agent towards specific files/directories, with specific targeted changes. It’s slower than 1-shot prompting an entire feature, but slightly faster than doing it manually, and I find the code is less “slop” because the changes are smaller and more human. It retains the agentic benefits of handing imports, compilation iteration, etc and can do basic cross-file plumbing. It’s also cheap and fast to do iterations like “wait make that method static” or “let’s update this to use <other util class>” and things like that. When the agent is slow to make localized edits, I find I’m less likely to push for minor nit-picks and style updates.

▲

andai an hour ago | parent | prev [-]

So you're making smaller edits?