Is it actually that hard to make good models or is it just about the amount of resources you have to do training? (This is an actual question, I really don't know.) I'm sure it's not trivial but does it really take world class secret knowledge to build off of the known existing techniques? I feel like there's tons of low hanging fruit still to explore, and time and resources are the limiting factor.

▲

MostlyStable 9 hours ago | parent | next [-]

The gap between grok and Gemini to Claude and chatgpt suggests that yes it is that hard.

▲

arw0n 4 hours ago | parent [-]

I suspect that Grok has been ironically lobotomized by pressures to correct its political views.

Similarly, I could imagine the Gemini folks working in a significantly more complex corporate climate, with different parts of Google pushing for different capability focuses. They are only lagging behind less than a year, so it isn't too large of a gap yet.

That said, the fact that Anthropic is currently the top dog suggests that talent and execution is incredibly important. A year ago none of my normie friends new them, and when i suggested using Claude looked at me like when I recommend Linux.

▲

janalsncm 2 hours ago | parent [-]

That shouldn’t affect Grok’ coding ability. How often are people discussing politics with Claude code? Writing decent code is just hard and it’s not just Grok.

▲

thot_experiment 35 minutes ago | parent | next [-]

Not true, aggressive post training makes models notably dumber.

▲

bwhiting2356 2 hours ago | parent | prev | next [-]

It affects their ability to hire and retain talent.

	▲	janalsncm 2 hours ago \| parent [-]
		If training a good model requires talent then that’s the answer to the question this thread is trying to answer: is training a good model actually that hard?

▲

black_knight 2 hours ago | parent | prev [-]

Why would these be independent?

▲

janalsncm 2 hours ago | parent [-]

More specifically, political lobotomy shouldn’t affect coding ability.

	▲	girvo 43 minutes ago \| parent \| next [-]
		You’d be quite surprised, I think. Fine tuning a model on one axis can have drastic impacts on another that as a human we would expect to be completely unrelated.
	▲	Discordian93 an hour ago \| parent \| prev \| next [-]
		Yet empirically it does
	▲	Hamuko 30 minutes ago \| parent \| prev [-]
		It's all a bunch of weights isn't it? Why wouldn't fiddling with some parts of the weights have cascading effects?

▲

fwipsy 8 hours ago | parent | prev [-]

Not hard to be a fast follower. Lots of companies are ~6-9 months behind. Reaching the actual bleeding edge is much harder.