Note that these are Python-only results, the model will not do as well with other languages.

I'm glad to see more domain-focused SLMs, we need more of them! A programming focused MoE should work well across many languages.

▲ rcarmo an hour ago | parent | next [-]

If it writes functional Python instead of cosplaying as a Java programmer and cramming code with classes and accessors, it's already better than Opus...

▲ nsingh2 7 hours ago | parent | prev [-]

Lots of confusion about what this model is actually focused on.

It is a cheap specialist for closed-world, verifiable reasoning tasks like math, self-contained coding problems, and similar.

"Closed-world" means the needed information is already in the context. It is not a tool-using agent that can discover missing context. "Verifiable" means answers are hard to generate but easy to check.

So no open ended research, repo wide agent work, factual Q&A, or SVG generation. More of a compact reasoning module for bounded problems.

▲ nsingh2 6 hours ago | parent | next [-]

To follow up on this, I had it solve a nasty ODE problem that I saw in the recent Mathematica 15 release post:

    Solve the following first-order ODE for f(x):

    ((-1 - 2*x)*f(x)*tan(1 + x - exp(-61 - 2*x)*f(x)/x)
    + exp(61 + 2*x)*x*(1 - x*tan(1 + x - exp(-61 - 2*x)*f(x)/x))
    + x*tan(1 + x - exp(-61 - 2*x)*f(x)/x)*f'(x)) = 0

    Find the general solution f(x).

And surprisingly it found a valid solution! Extra impressive because it runs 25 tok/s on my measly RTX 2070 super.

    f(x) = x*exp(61 + 2*x)*(1 + x - arccos(C/x))

    C is an arbitrary constant.

Apparently Mathematica 14.3 couldn't solve this ODE.

	▲	le-mark an hour ago \| parent \| next [-]
		How do you know it’s a valid solution? Are you able to verify it yourself?
	▲	trick-or-treat 4 hours ago \| parent \| prev \| next [-]
		How do we know the solution isn't in the weights though?
	▲	kame3d 4 hours ago \| parent \| prev [-]
		Interesting! I just tried the quantized Q4_K_M from [1] in my RTX 2070 Super, it ran at 110 tok/s with 1800 tok/s prefill, and found the same solution to your prompt. It generated valid LaTeX for the answer but its reasoning trace uses mostly compact ASCII math notation. Took 3min 22s to answer, spending 22k tokens almost all on thinking. [1] https://huggingface.co/prithivMLmods/VibeThinker-3B-GGUF

▲ skeledrew 4 hours ago | parent | prev [-]

If it can code well then once you put it in a loop with an interpreter it can do anything.