| ▲ | raincole 3 hours ago | |||||||||||||||||||||||||||||||
Even before this, Gemini 3 has always felt unbelievably 'general' for me. It can beat Balatro (ante 8) with text description of the game alone[0]. Yeah, it's not an extremely difficult goal for humans, but considering: 1. It's an LLM, not something trained to play Balatro specifically 2. Most (probably >99.9%) players can't do that at the first attempt 3. I don't think there are many people who posted their Balatro playthroughs in text form online I think it's a much stronger signal of its 'generalness' than ARC-AGI. By the way, Deepseek can't play Balatro at all. | ||||||||||||||||||||||||||||||||
| ▲ | ankit219 4 minutes ago | parent | next [-] | |||||||||||||||||||||||||||||||
Agreed. Gemini 3 Pro for me has always felt like it has had a pretraining alpha if you will. And many data points continue to support that. Even as flash, which was post trained with different techniques than pro is good or equivalent at tasks which require post training, occasionally even beating pro. (eg: in apex bench from mercor, which is basically a tool calling test - simplifying - flash beats pro). The score on arc agi2 is another datapoint in the same direction. Deepthink is sort of parallel test time compute with some level of distilling and refinement from certain trajectories (guessing based on my usage and understanding) same as gpt-5.2-pro and can extract more because of pretraining datasets. (i am sort of basing this on papers like limits of rlvr, and pass@k and pass@1 differences in rl posttraining of models, and this score just shows how "skilled" the base model was or how strong the priors were. i apologize if this is not super clear, happy to expand on what i am thinking) | ||||||||||||||||||||||||||||||||
| ▲ | tl 5 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
Per BalatroBench, gemini-3-pro-preview makes it to round (not ante) 19.3 ± 6.8 on the lowest difficulty on the deck aimed at new players. Round 24 is ante 8's final round. Per BalatroBench, this includes giving the LLM a strategy guide, which first-time players do not have. Gemini isn't even emitting legal moves 100% of the time. | ||||||||||||||||||||||||||||||||
| ▲ | ebiester 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
It's trained on YouTube data. It's going to get roffle and drspectred at the very least. | ||||||||||||||||||||||||||||||||
| ▲ | silver_sun 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
Google has a library of millions of scanned books from their Google Books project that started in 2004. I think we have reason to believe that there are more than a few books about effectively playing different traditional card games in there, and that an LLM trained with that dataset could generalize to understand how to play Balatro from a text description. Nonetheless I still think it's impressive that we have LLMs that can just do this now. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
| ▲ | winstonp 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
DeepSeek hasn't been SotA in at least 12 calendar months, which might as well be a decade in LLM years | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
| ▲ | tehsauce 15 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
How does it do on gold stake? | ||||||||||||||||||||||||||||||||
| ▲ | dudisubekti 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
But... there's Deepseek v3.2 in your link (rank 7) | ||||||||||||||||||||||||||||||||
| ▲ | littlestymaar 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
> . I don't think there are many people who posted their Balatro playthroughs in text form online There are *tons* of balatro content on YouTube though, and it makes absolutely zero doubt that Google is using YouTube content to train their model. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
| ▲ | acid__ 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
> Most (probably >99.9%) players can't do that at the first attempt Eh, both myself and my partner did this. To be fair, we weren’t going in completely blind, and my partner hit a Legendary joker, but I think you might be slightly overstating the difficulty. I’m still impressed that Gemini did it. | ||||||||||||||||||||||||||||||||
| ▲ | Falsintio 2 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||
[dead] | ||||||||||||||||||||||||||||||||