Remix.run Logo
2ndorderthought 2 hours ago

I've been saying it for a long time now. I think small models are the future for LLMs. It's been fun seeing experiments to see just how much better models get by making them insanely large but it's not sustainable.

No I am not saying this model is a drop in Claude replacement. But I think in 2 years we might be really surprised what can be done in a desktop with commodity hardware, no connection to the internet, and a few models that span a subset of tasks.

Really happy to see amd put their hat in the ring. It's a good day for amd investors. I know a lot of AI bros will scoff at this, but having your first training run is a big deal for a new lab. AMD is on their way despite Nvidia having years of runway

zimi-24-imiz 2 hours ago | parent | next [-]

using C was 100 times as productive as assembly. what happened was not that we finished software 100 times faster, but that we did projects 100 times bigger in the same time

same thing with smol local LLMs versus the big ones in the sky. your smol local LLM will only be able to tackle projects which are not comercially valuable anymore, because people expect 100x scope and features. which is fine as a hobby/art project

yes, we'll do amazing things with local LLMs in 2 years, but the big LLMs will do things beyond imagination (assembly vs C)

2ndorderthought an hour ago | parent [-]

I disagree. I think people can make very good software by balancing their use of AI and their market knowledge. I still believe for the foreseeable future people can make wildly loved or mission critical software with 0 ai and have it be met with market interest.

I think we are going to see a surge in software claiming to do everything and becoming bloated and unsustainable.

I already see 1gpu local models 1 shotting games via vibe coding. I see people doing agentic programming, granted more slowly and cheaply than 12 Claude sessions.

The difference isn't as big as it was 2 months ago. In the past 45 days so many model releases have happened. Meanwhile frontier performance has stagnated and degraded. If it's a taste of what is to come I welcome it.

hparadiz an hour ago | parent [-]

I'm like two months into a vibe coded C project. My issues are the same as ever. How to pack memory. What syscalls to run and when. Is the program stable after running for 24 hours? When I want to make a change it's usually a trade off with something else. There's no accounting for taste among humans. Let alone among an LM. It's great at implementing my ideas but terrible at coming up with those ideas. Architecture is always going to be king.

sheepscreek 8 minutes ago | parent [-]

Models are heavily fine tuned and trained to follow instructions. They are trained to be subservient. I am sure that cuts into their ability to think creatively. The other risk with a lot of creative thinking is risking hallucinations (creative thinking = perhaps trying what’s not in its training set = hallucination basically). So I will rephrase creative thinking as desired or useful hallucination that is still firmly within the constraints of the prompt.

If that sounds complicated, that’s because it is! It’s a tricky balance to get right. I think the current architecture for most GPT models isn’t sufficient to solve this problem for good. I suppose we need to do more research into what constitutes desirable vs undesirable hallucination and how to shift the balance towards the latter.

steveharing1 an hour ago | parent | prev [-]

You couldn't be any more right!

zimi-24-imiz an hour ago | parent [-]

but he could be absolutely right

steveharing1 an hour ago | parent [-]

He could be right but time will tell if we can really achieve that level in open source space because as you know Even in open source space companies go closed when they achieve something really efficient and frontier. I'm not talking about all but that's usually a pattern

2ndorderthought an hour ago | parent | next [-]

There are a lot of hats in the ring. I don't see Alibaba shutting down anytime soon. They make qwen.

Deepseek is doing valuations right now.

Moonshot is just getting started. Same with AMD. mistral is still working hard at it and has a customer base.

An Egyptian company dropped their first small model this month, Horus.

There are enough geopolitics at play that I expect this to be a very different outcome from typical startup market dynamics. If anything j worry about the big us labs longevity. The world is fed up with US tech it seems, and even for us citizens it's questionable the frontier labs have their interests in mind as they risk the entire economy.

adrian_b an hour ago | parent | prev [-]

That is a danger, but for now it seems rather distant.

OpenAI has provided in the past a couple of open-weights models, but it does not seem to plan the release of any others.

But except for OpenAI and Anthropic, with this announcement Zyphra is the 12th company which has announced new improved open-weights models during the last couple of months.

A half of these 12 companies have launched not only small models with less than 128B parameters, but also big models with a number of parameters ranging from over 200B to over 1T.

So for now there is a healthy competition and the offerings in open-weights models are very diverse and numerous.

(The 12 directories on huggingface.co: deepseek-ai, google, ibm-granite, LiquidAI, MiniMaxAI, mistralai, moonshotai, nvidia, Qwen, XiaomiMiMo, zai-org, Zyphra.)