They have developed an LLM, so they are an AI lab, but the quality of that model suggests they're not a frontier anything.

▲

leetharris 6 hours ago | parent | next [-]

I have the pro account for ChatGPT, Claude, Gemini, and Grok.

They all have various strengths and weaknesses. My favorite is still ChatGPT, then Gemini/Claude, then Grok.

Grok often feels 1-2 generations behind the competition in general use, but it has three things that I love:

1. It seems to be the best at understanding current events. Maybe due to X integration, or some other tool call optimization in the backend? I don't know, but I often ask about things going on, and the other models have outdated info, give unhelpful answers, etc.

2. It is generally the least sycophantic for personal things. Anthropic is getting here too. ChatGPT and Gemini are working on this, but previous models in those families would almost never say anything negative about what I am doing. Sometimes I need career advice, personal advice, etc and I like the tone of how it responds. I think Claude will be caught up soon.

3. For professional work, there are certain topics that other models would refuse to engage with. At my last company we had an enormous amount of legal users. When a deposition would need a summary on certain topics, most models would refuse. Grok would not. I understand the need for safety and I don't blame the other model providers, but for some professional use cases you NEED a model that is capable of handling sensitive subjects.

▲

e9 5 hours ago | parent | next [-]

I recently worked with NRC dataset, specifically about nuclear reactor events and status reports(example: https://www.nrc.gov/reading-rm/doc-collections/event-status/...). Public data that just needed some cleaning. Several time Claude API would refuse to engage. Because of that I can't trust Claude to clean production data sets.

▲

emodendroket 2 hours ago | parent | prev | next [-]

> 1. It seems to be the best at understanding current events. Maybe due to X integration, or some other tool call optimization in the backend? I don't know, but I often ask about things going on, and the other models have outdated info, give unhelpful answers, etc.

That makes sense, but occasionally you ask about an issue where it's clearly received political instruction from the commissar and it acts totally lobotomized. But it's true that Gemini will often blithely state that something could never happen and you'll say "what do you mean, that just happened" and then it comes back apologizing after running a Web search.

▲

square_usual 5 hours ago | parent | prev | next [-]

Opus 4.8 has made huge jumps in being less sycophantic. I see it pushing back on ideas a lot, and that's very helpful when you're evaluating options.

▲

lachlan_gray 5 hours ago | parent [-]

Almost too much so, it often feels like opus is pushing back for the sake of pushing back. The way old models used to add disclaimers to every message regardless of content

▲

NewJazz 5 hours ago | parent [-]

That's because it can't literally reason, it has just been manually steered into those reasoning speech cycles.

▲

emodendroket 2 hours ago | parent [-]

Yes, yes. Does everyone still find it interesting to go over this point every time about how it's not literally a person with human reasoning?

	▲	NewJazz an hour ago \| parent [-]
		Uh, only when people don't seem to understand it, or try to personify it. Which is quite often.

▲

deaton 5 hours ago | parent | prev | next [-]

All 4 of these still regularly insist that I am a genius and everything I say is brilliant. Grok definitely pushes back more than the others, but I don't like how sycophantic they all still are.

	▲	pell 4 hours ago \| parent [-]
		I don’t want to open up that whole can of worms but Grok on any vaguely philosophical or political topic is a scaredy cat and has a very hard time staying factual if it could make Musk or the conservative movement appear negatively.

▲

nonethewiser 5 hours ago | parent | prev | next [-]

What are you using it for? Im pretty surprised ChatGPT is your top model but maybe you arent using it for code.

▲

Azantys 5 hours ago | parent | prev | next [-]

Career and personal advice from LLMs, not sure if thats your best bet

▲

cactusplant7374 5 hours ago | parent | prev | next [-]

But in terms of agentic coding? Dead last.

▲

htx80nerd 4 hours ago | parent | prev | next [-]

My favorite was ChatGPT, and I still use it often, but it becomes way too 'hair splitting' argumentative too often over very minor non controversial topics. Like it's always going out of its way to "well actually..."

Grok used to be really really bad ~8 months ago or so, but it's gotten better.

ChatGPT team needs to turn down the 'disagree just because' factor by a lot.

▲

epolanski 5 hours ago | parent | prev | next [-]

My SO works in audit/compliance and business Gemini definitely does not refuse to answer.

▲

selicos 4 hours ago | parent | prev [-]

1. It seeks to manipulate the information you see and your lens to the world. This is already partially true from independent and major publications.

As soon as we hand over searching out information to social media algorithms and LLM tools, we abandon our ability to see reality outside our direct vision.

Grok's ownership has already demonstrated capacity to influence major world elections and other events. You cannot trust it with this sort of information gathering and reporting.

▲

fooker 6 hours ago | parent | prev | next [-]

> the quality of that model

I guess the benchmarks disagree, but whenever I need to find specific information that does not easily show up with a web search, I try chatgpt, gemini and grok. Grok surfaces what I was looking for more often than the others.

Things like "find the github repo from 2017 that does $vague_thing".

▲

chatmasta 5 hours ago | parent | next [-]

Grok does seem to have the best searching capabilities, and not just for twitter. I wonder what search engine they’re using on the backend.

	▲	PixyMisa 2 hours ago \| parent [-]
		Good question. You can actually see the searches it runs (momentarily) so testing could determine if it's using public search engines or a private system.

▲

PixyMisa 2 hours ago | parent | prev | next [-]

I find that too. I use Claude for coding but when I need to dig out something based on limited data I turn to Grok and it delivers.

▲

Azantys 5 hours ago | parent | prev | next [-]

Isnt that more Perplexitys thing anyways?

▲

gowld 5 hours ago | parent | prev [-]

Can you give a specific example (that doesn't violate any privacy you want to protect)?

▲

beepbopboopp 6 hours ago | parent | prev | next [-]

Or the model was a marketing expense to capitalize the data center model. Im not saying it was intentionally that, but its been an effective "that."

▲

bpodgursky 6 hours ago | parent | next [-]

Eh. It was a leading model for a few weeks, it was a real effort, but they never built a real revenue model around it. It wasn't SaaS, it wasn't for governments, it couldn't get B2C payments. Made it hard to justify the training cost to stay at the frontier.

	▲	dsgn93 5 hours ago \| parent [-]
		[flagged]

▲

Qhemlomo 6 hours ago | parent | prev [-]

So like the 4D Chess Trump is playing with us?

Come on, the most logical thing is that Musk overestimated the compute he needs and got lucky with the secondary usage of it.

As soon as the IPO is done and if it didn't fail, he will buy curser and try to push again if he hasn't given up on it.

He also needs some compute for the robotics stuff and for Tesla in-car entertainment and for training FSD.

▲

mbesto 6 hours ago | parent | prev | next [-]

And they are planning (well "planning" if you believe Elon) to start building their LLM over from scratch, which means they need a HUGE ass training data center, i.e. not a data center for inference to do so.

▲

bottlepalm 6 hours ago | parent | prev | next [-]

Grok isn't at the front of the frontier, but they are there for sure.

▲

harrall 5 hours ago | parent | prev | next [-]

But supposedly they’re the cheapest for certain workloads, especially ones that have high tokens and can make use of caching.

So they’re cutting edge in that way.

▲

gowld 5 hours ago | parent | prev | next [-]

I am also an "AI lab", but I look more like a corporate cog, because that's where most of my revenue comes from and how I spend the most my time.

▲

throwaway67678 6 hours ago | parent | prev [-]

Pretty funny how making it anti-woke made it suck, whereas Claude's ultrawoke sensibilities and "constitution" didn't prevent it from being the de facto leader of the pack the moment it came out

	▲	plaidthunder 5 hours ago \| parent [-]
		It's a general problem of defining yourself in negative terms. Being "un-{thing I don't like}" doesn't say what you are. It only excludes one possibility while leaving behind an infinitude of mostly crappy alternatives to try to choose from. Having a positive set of beliefs annoys people and and can make them feel judged, but at least it provides a vector that points somewhere definite in possibility space.