Remix.run Logo
Open-weight 27B hits 38% on Terminal-Bench 2.0 (Opus 4.1 hit 38% in Aug 2025)(antigma.ai)
6 points by ubermon 9 hours ago | 10 comments
annjose 8 hours ago | parent | next [-]

> today's best runnable-offline model is roughly 6–8 months behind today's frontier.

But it doesn't matter because frontier models were extremely good 8 months ago and we were doing real work with them. Now we have more capable open-source agents like pi and OpenCode which work well with these models.

More importantly, offline models is the best choice for privacy, on-device inference and no token/cost anxiety.

swrrt 3 hours ago | parent | next [-]

Yep, offline mode is useful for edge devices also. I am considering deploying a extremely small model on steam deck actually.

ubermon 6 hours ago | parent | prev [-]

Totally agree! I think we are very early on discovering the full potential of local models

merkleforest 7 hours ago | parent | prev | next [-]

> 2. How local use feels in practice

Do we have stats on how does the models do on Mac M-series chips?

ubermon 6 hours ago | parent [-]

Not yet, will conduct a more comprehensive one later

timothyshen123 8 hours ago | parent | prev | next [-]

Interesting find on this. Thanks for sharing

ubermon 6 hours ago | parent [-]

Thank you! I think there is a lot to dive deep later with different hardware, inference engine, prompt/harness setup etc.

debpack 8 hours ago | parent | prev | next [-]

this is super sick man

ubermon 6 hours ago | parent [-]

Thanks!

ubermon 9 hours ago | parent | prev [-]

[dead]