1 bit with a FP16 scale factor every 128 bits. Fascinating that this works so well.

I tried a few things with it. Got it driving Cursor, which in itself was impressive - it handled some tool usage. Via cursor I had it generate a few web page tests.

On a monte carlo simulation of pi, it got the logic correct but failed to build an interface to start the test. Requesting changes mostly worked, but left over some symbols which caused things to fail. Required a bit of manual editing.

Tried a Simon Wilson pelican as well - very abstract, not recognizable at all as a bird or a bicycle.

Pictures of the results here: https://x.com/pwnies/status/2039122871604441213

There doesn't seem to be a demo link on their webpage, so here's a llama.cpp running on my local desktop if people want to try it out. I'll keep this running for a couple hours past this post: https://unfarmable-overaffirmatively-euclid.ngrok-free.dev

▲

najarvg 4 hours ago | parent | next [-]

Thanks for sharing the link to your instance. Was blazing fast in responding. Tried throwing a few things at it with the following results: 1. Generating an R script to take a city and country name and finding it's lat/long and mapping it using ggmaps. Generated a pretty decent script (could be more optimal but impressive for the model size) with warnings about using geojson if possible 2. Generate a latex script to display the gaussian integral equation - generated a (I think) non-standard version using probability distribution functions instead of the general version but still give it points for that. Gave explanations of the formula, parameters as well as instructions on how to compile the script using BASH etc 3. Generate a latex script to display the euler identity equation - this one it nailed.

Strongly agree that the knowledge density is impressive for the being a 1-bit model with such a small size and blazing fast response

▲

jjcm 4 hours ago | parent | next [-]

> Was blazing fast in responding.

I should note this is running on an RTX 6000 pro, so it's probably at the max speed you'll get for "consumer" hardware.

▲

abrookewood an hour ago | parent | next [-]

Holy hell ... that's a monster of a card

▲

ineedasername an hour ago | parent | prev [-]

consumer hardware?

That... pft. Nevermind, I'm just jealous

	▲	jjcm an hour ago \| parent [-]
		Look it was my present to myself after the Figma IPO (worked there 5 years). If you want to feel less jealous, look at the stock price since then.

▲

najarvg 3 hours ago | parent | prev [-]

I must add that I also tried out the standard "should I walk or drive to the carwash 100 meters away for washing the car" and it made usual error or suggesting a walk given the distance and health reasons etc. But then this does not claim to be a reasoning model and I did not expect, in the remotest case, for this to be answered correctly. Ever previous generation larger reasoning models struggle with this

	▲	jjcm 3 hours ago \| parent [-]
		[dead]

▲

andai 2 hours ago | parent | prev | next [-]

Thanks. Did you need to use Prism's llama.cpp fork to run this?

▲

jjcm an hour ago | parent [-]

Yep.

	▲	andai 10 minutes ago \| parent [-]
		Could you elaborate on what you did to get it working? I built it from source, but couldn't get it (the 4B model) to produce coherent English. Sample output below (the model's response to "hi" in the forked llama-cli): X ( Altern as the from (.. Each. ( the or,./, and, can the Altern for few the as ( (. . ( the You theb,’s, Switch, You entire as other, You can the similar is the, can the You other on, and. Altern. . That, on, and similar, and, similar,, and, or in

▲

abrookewood an hour ago | parent | prev | next [-]

man, that is really really quick. What is your desktop setup??? GPU?

	▲	jjcm 30 minutes ago \| parent [-]
		It is fast, but I do have good hardware. A few people have asked for my local inference build, so I have an existing guide that mirrors my setup: https://non.io/Local-inference-build

▲

rjh29 2 hours ago | parent | prev | next [-]

I reminds me of very early ChatGPT with mostly correct answers but some nonsense. Given its speed, it might be interesting to run it through a 'thinking' phase where it double checks its answers and/or use search grounding which would make it significantly more useful.

▲

pdyc an hour ago | parent | prev | next [-]

thanks, i tested it, failed in strawberry test. qwen 3.5 0.8B with similar size passes it and is far more usable.

▲

adityashankar 4 hours ago | parent | prev | next [-]

here's the google colab link, https://colab.research.google.com/drive/1EzyAaQ2nwDv_1X0jaC5... since the ngrok like likely got ddosed by the number of individuals coming along

▲

jjcm 4 hours ago | parent [-]

Good call. Right now though traffic is low (1 req per min). With the speed of completion I should be able to handle ~100x that, but if the ngrok link doesn't work defo use the google colab link.

	▲	adityashankar 4 hours ago \| parent [-]
		The link didn't work for me personally, but that may be a bandwidth issue with me fighting for a connection in the EU

▲

uf00lme 4 hours ago | parent | prev | next [-]

The speed is impressive, I wish it could be setup for similar to speculative decoding

▲

hmokiguess 4 hours ago | parent | prev | next [-]

wow that was cooler than I expected, curious to embed this for some lightweight semantic workflows now

▲

tristanMatthias 3 hours ago | parent | prev [-]

[dead]