Remix.run Logo
lifeformed 9 hours ago

Is it actually that hard to make good models or is it just about the amount of resources you have to do training? (This is an actual question, I really don't know.) I'm sure it's not trivial but does it really take world class secret knowledge to build off of the known existing techniques? I feel like there's tons of low hanging fruit still to explore, and time and resources are the limiting factor.

MostlyStable 9 hours ago | parent | next [-]

The gap between grok and Gemini to Claude and chatgpt suggests that yes it is that hard.

arw0n 4 hours ago | parent [-]

I suspect that Grok has been ironically lobotomized by pressures to correct its political views.

Similarly, I could imagine the Gemini folks working in a significantly more complex corporate climate, with different parts of Google pushing for different capability focuses. They are only lagging behind less than a year, so it isn't too large of a gap yet.

That said, the fact that Anthropic is currently the top dog suggests that talent and execution is incredibly important. A year ago none of my normie friends new them, and when i suggested using Claude looked at me like when I recommend Linux.

janalsncm 2 hours ago | parent [-]

That shouldn’t affect Grok’ coding ability. How often are people discussing politics with Claude code? Writing decent code is just hard and it’s not just Grok.

thot_experiment 35 minutes ago | parent | next [-]

Not true, aggressive post training makes models notably dumber.

bwhiting2356 2 hours ago | parent | prev | next [-]

It affects their ability to hire and retain talent.

janalsncm 2 hours ago | parent [-]

If training a good model requires talent then that’s the answer to the question this thread is trying to answer: is training a good model actually that hard?

black_knight 2 hours ago | parent | prev [-]

Why would these be independent?

janalsncm 2 hours ago | parent [-]

More specifically, political lobotomy shouldn’t affect coding ability.

girvo 43 minutes ago | parent | next [-]

You’d be quite surprised, I think. Fine tuning a model on one axis can have drastic impacts on another that as a human we would expect to be completely unrelated.

Discordian93 an hour ago | parent | prev | next [-]

Yet empirically it does

Hamuko 30 minutes ago | parent | prev [-]

It's all a bunch of weights isn't it? Why wouldn't fiddling with some parts of the weights have cascading effects?

fwipsy 8 hours ago | parent | prev [-]

Not hard to be a fast follower. Lots of companies are ~6-9 months behind. Reaching the actual bleeding edge is much harder.