Remix.run Logo
XCSme 6 days ago

I tried 4.1-mini and 4.1-nano. The response are a lot faster, but for my use-case they seem to be a lot worse than 4o-mini(they fail to complete the task when 4o-mini could do it). Maybe I have to update my prompts...

XCSme 6 days ago | parent | next [-]

Even after updating my prompts, 4o-mini still seems to do better than 4.1-mini or 4.1-nano for a data-processing task.

BOOSTERHIDROGEN 6 days ago | parent [-]

Mind sharing your system prompt?

XCSme 6 days ago | parent [-]

It's quite complex, but the task is to parse some HTML content, or to choose from a list of URLs which one is the best.

I will check again the prompt, maybe 4o-mini ignores some instructions that 4.1 doesn't (instructions which might result in the LLM returning zero data).

jjani 5 days ago | parent | prev [-]

That sounds incredibly disappointing given how high their benchmarks are, indicating they might be overtuned for those, similar to Llama4.

XCSme 5 days ago | parent [-]

Yeah, I think so too. They seemed to be better at specific tasks, but worse overall, at broader tasks.