Remix.run Logo
buryat 6 hours ago

I have a Mac Studio with 512GB Ram and ran models of different sizes to test out how local agents are and I agree that local models aren't there yet but that depends on whether you need a lot of knowledge or not to answer your question, and I think it should be possible to either distill or train a smaller model that works on a subset of knowledge tailored toward local execution. My main interest is in reducing the latency and it feels that the local agents that work at high speeds should be an answer to this but it's not something that someone is trying to solve yet. Feels like if I could get a smaller model that could run at incredible speed locally that could unlock some interesting autoresearching.

robwwilliams 4 hours ago | parent | next [-]

Also running gemma-4 on Apple M5 Max. As fast or faster than Opus 4.6 extended but not of course the same competence. However, great tunability with llama.cpp and no issues related to IP leakage.

musicale 2 hours ago | parent | prev | next [-]

> Mac Studio with 512GB Ram

Nice to score one of those.

verdverm 6 hours ago | parent | prev [-]

I've been running Gemma4, my initial experiments put it around gemini-3-flash levels (vibe evals)