| ▲ | Show HN: TetrisBench – Gemini Flash reaches 66% win rate on Tetris against Opus(tetrisbench.com) | |||||||||||||||||||||||||||||||||||||
| 76 points by ykhli 9 hours ago | 32 comments | ||||||||||||||||||||||||||||||||||||||
| ▲ | bubblesorting 8 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
Very cool! I am a good Tetris player (in the top 10% of players) and wanted to give brick yeeting against an LLM a spin. Some feedback: - Knowing the scoring system is helpful when going 1v1 high score - Use a different randomization system, I kept getting starved for pieces like I. True random is fine, throwing a copy of every piece into a bag and then drawing them one by one is better (7 bag), nearly random with some lookbehind to prevent getting a string of ZSZS is solid, too (TGM randomizer) - Piece rotation feels left-biased, and keeps making me mis-drop, like the T pieces shift to the left if you spin 4 times. Check out https://tetris.wiki/images/thumb/3/3d/SRS-pieces.png/300px-S... or https://tetris.wiki/images/b/b5/Tgm_basic_ars_description.pn... for examples of how other games are doing it. - Clockwise and counter-clockwise rotation is important for human players, we can only hit so many keys per second - re-mappable keys are also appreciated Nice work, I'm going to keep watching. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | ykhli 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Thanks for all the questions! More details on how this works: - Each model starts with an initial optimization function for evaluating Tetris moves. - As the game progresses, the model sees the current board state and updates its algorithm—adapting its strategy based on how the game is evolving. - The model continuously refines its optimizer. It decides when it needs to re-evaluate and when it should implement the next optimization function - The model generates updated code, executes it to score all placements, and picks the best move. - The reason I reframed this problem to a coding problem is Tetris is an optimization game in nature. At first I did try asking LLMs where to place each piece at every turn but models are just terrible at visual reasoning. What LLMs great at though is coding. | ||||||||||||||||||||||||||||||||||||||
| ▲ | augusteo 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
LLMs playing Tetris feels like testing a calculator's ability to write poetry. Interesting as a curiosity, but the results don't transfer to the tasks where these models actually excel. Curious what the latency looks like per move. That seems like the actual bottleneck here. | ||||||||||||||||||||||||||||||||||||||
| ▲ | bityard 5 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Looks fun, but I'm not willing to give out my email address just to play a game. Also, if the creator is reading this, you should know that Tetris Holdings is extremely aggressive with their trademark enforcement. | ||||||||||||||||||||||||||||||||||||||
| ▲ | vunderba 7 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Interesting but frustratingly vague on details. How exactly are the models playing? Is it using some kind of PGN equivalent in Tetris that represents a on-going game, passing an ASCII representation, encoding as a JSON structure, or just directly sending screenshots of the game to the various LLMs? | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | OGEnthusiast 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Gemini 3 Flash is at a very nice point along the price-performance curve. A good workhorse model, while supplementing it with Opus 4.5 / Gemini 3 Pro for more complex tasks. | ||||||||||||||||||||||||||||||||||||||
| ▲ | burkaman 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
It's actually 80% against Opus, 66% average against the 5 models it's tested with. | ||||||||||||||||||||||||||||||||||||||
| ▲ | p0w3n3d 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Guys, I don't know how to tell you but... Tetris can web solved without LLM... | ||||||||||||||||||||||||||||||||||||||
| ▲ | esafak 7 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
I imagine this is because Tetris is visual and the Gemini models are strong visually. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | arendtio 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
There are some concepts clashing here. I mean, if you let the LLM build a testris bot, it would be 1000x better than what the LLMs are doing. So yes, it is fun to win against an AI, but to be fair against such processing power, you should not be able to win. It is only possible because LLMs are not built for such tasks. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | akomtu 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
It would be more interesting to make it build a chess engine and compare it against Stockfish. The chess engine should be a standalone no-dependencies C/C++ program that fits in NNN lines of code. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | segmondy 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
... and what does this prove? what can you decide to use one LLM to solve over another based on this tetrisbench besides play tetris? | ||||||||||||||||||||||||||||||||||||||
| ▲ | tiahura 7 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
I'd like to see a nethackbench. | ||||||||||||||||||||||||||||||||||||||
| ▲ | indigodaddy 5 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Is there a tl;dr on why this is? Does it just make faster decisions? | ||||||||||||||||||||||||||||||||||||||
| ▲ | purplecats 6 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
watch link? | ||||||||||||||||||||||||||||||||||||||