Remix.run Logo
wombatpm 2 hours ago

Couple of observations:

Companies use to hoard talent. Now they are hoarding compute, RAM, and GPUs.

Deepseek showed that there are possibly less expensive ways to train, meaning the future eye watering expenses may not happen.

Bigger models may not scale. The future may be federations of smaller expert models. Chat GPTX doesn’t need to know everything about mental health, it just needs to recognize the the Sigmund von Shrink mental health model needs to answer some of my questions.

chipgap98 2 hours ago | parent [-]

Deepseek showed that distillation is possible. Their results are possible without someone else doing the leading edge training