Remix.run Logo
resters 6 days ago

I use them as follows:

o1-pro: anything important involving accuracy or reasoning. Does the best at accomplishing things correctly in one go even with lots of context.

deepseek R1: anything where I want high quality non-academic prose or poetry. Hands down the best model for these. Also very solid for fast and interesting analytical takes. I love bouncing ideas around with R1 and Grok-3 bc of their fast responses and reasoning. I think R1 is the most creative yet also the best at mimicking prose styles and tone. I've speculated that Grok-3 is R1 with mods and think it's reasonably likely.

4o: image generation, occasionally something else but never for code or analysis. Can't wait till it can generate accurate technical diagrams from text.

o3-mini-high and grok-3: code or analysis that I don't want to wait for o1-pro to complete.

claude 3.7: occasionally for code if the other models are making lots of errors. Sometimes models will anchor to outdated information in spite of being informed of newer information.

gemini models: occasionally I test to see if they are competitive, so far not really, though I sense they are good at certain things. Excited to try 2.5 Deep Research more, as it seems promising.

Perplexity: discontinued subscription once the search functionality in other models improved.

I'm really looking forward to o3-pro. Let's hope it's available soon as there are some things I'm working on that are on hold waiting for it.

rushingcreek 6 days ago | parent | next [-]

Phind was fine-tuned specifically to produce inline Mermaid diagrams for technical questions (I'm the founder).

underlines 5 days ago | parent | next [-]

I really loved Phind and always think of it as the OG perplexity / RAG search engine.

Sadly stopped my subscription, when you removed the ability to weight my own domains...

Otherwise the fine-tune for your output format for technical questions is great, with the options, the pro/contra and the mermaid diagrams. Just way better for technical searches, than what all the generic services can provide.

bsenftner 5 days ago | parent | prev [-]

Have you been interviewed anywhere? Curious to read your story.

shortcord 6 days ago | parent | prev | next [-]

Gemini 2.5 Pro is quite good at code.

Has become my go to for use in Cursor. Claude 3.7 needs to be restrained too much.

artdigital 5 days ago | parent | next [-]

Same here, 2.5 Pro is very good at coding. But it’s also cocky and blames everything but itself for something not working. Eg “the linter must be wrong you should reinstall it”, “looks to be a problem with the Go compiler”, “this function HAS to exist, that’s weird that we’re getting an error”

And it often just stops like “ok this is still not working. You fix it and tell me when it’s done so I can continue”.

But for coding: Gemini Pro 2.5 > Sonnet 3.5 > Sonnet 3.7

valenterry 6 days ago | parent | prev | next [-]

Weird. For me, sonnet 3.7 is much more focussed and in particular works much better when finding the places that needs change and using other tooling. I guess the integration in cursor is just much better and more mature.

behnamoh 6 days ago | parent | prev | next [-]

This. sonnet 3.7 is a wild horse. Gemini 2.5 Pro is like a 33 yo expert. o1 feels like a mature, senior colleague.

benhurmarcel 5 days ago | parent | prev [-]

I find that Gemini 2.5 Pro tends to produce working but over-complicated code more often than Claude 3.7.

torginus 5 days ago | parent [-]

Which might be a side-effect of the reasoning.

In my experience whenever these models solve a math or logic puzzle with reasoning, they generate extremely long and convoluted chains of thought which show up in the solution.

In contrast a human would come up with a solution with 2-3 steps. Perhaps something similar is going on here with the generated code.

motoboi 6 days ago | parent | prev | next [-]

You probably know this but it can already generate accurate diagrams. Just ask for the output in a diagram language like mermaid or graphviz

bangaladore 6 days ago | parent | next [-]

My experience is it often produces terrible diagrams. Things clearly overlap, lines make no sense. I'm not surprised as if you told me to layout a diagram in XML/YAML there would be obvious mistakes and layout issues.

I'm not really certain a text output model can ever do well here.

resters 6 days ago | parent | next [-]

FWIW I think a multimodal model could be trained to do extremely well with it given sufficient training data. A combination of textual description of the system and/or diagram, source code (mermaid, SVG, etc.) for the diagram, and the resulting image, with training to translate between all three.

bangaladore 6 days ago | parent [-]

Agreed. Even simply I'm sure a service like this already exists (or could easily exist) where the workflow is something like:

1. User provides information

2. LLM generates structured output for whatever modeling language

3. Same or other multimodal LLM reviews the generated graph for styling / positioning issues and ensure its matches user request.

4. LLM generates structured output based on the feedback.

5. etc...

But you could probably fine-tune a multimodal model to do it in one shot, or way more effectively.

behnamoh 6 days ago | parent | prev [-]

I had a latex tikz diagram problem which sonnet 3.7 couldn't handle even after 10 attempts. Gemini 2.5 Pro solved it on the second try.

gunalx 6 days ago | parent [-]

Had the same experience. o3-mini failing misreably, claude 3.7 as well, but gemini 2.5 pro solved it perfectly. (image of diagram without source to tikz diagram)

resters 6 days ago | parent | prev | next [-]

I've had mixed and inconsistent results and it hasn't been able to iterate effectively when it gets close. Could be that I need to refine my approach to prompting. I've tried mermaid and SVG mostly, but will also try graphviz based on your suggestion.

antman 6 days ago | parent | prev [-]

Plantuml (action) diagrams are my go to

wavewrangler 6 days ago | parent | prev | next [-]

You probably know this and are looking for consistency but, a little trick I use is to feed the original data of what I need as a diagram and to re-imagine, it as an image “ready for print” - not native, but still a time saver and just studying with unstructured data or handles this surprisingly well. Again not native…naive, yes. Native, not yet. Be sure to double check triple check as always. give it the ol’ OCD treatment.

barrkel 5 days ago | parent | prev | next [-]

Gemini 2.5 is very good. Since you have to wait for reasoning tokens, it takes longer to come back, but the responses are high quality IME.

czk 6 days ago | parent | prev [-]

re: "grok-3 is r1 with mods" -- do you mean you believe they distilled deepseek r1? that was my assumption as well, though i thought it more jokingly at first it would make a lot of sense. i actually enjoy grok 3 quite a lot, it has some of the most entertaining thinking traces.