▲ | viraptor 4 days ago | |||||||||||||||||||||||||
I'm glad he improved the promoting, but he's still leaving out two likely huge improvements. 1. Explain the current board position and the plan going forwards, before proposing a move. This lets the model actually think more, kind of like o1, but here it would guarantee a more focused processing. 2. Actually draw the ascii board for each step. Hopefully producing more valid moves since board + move is easier to reliably process than 20×move. | ||||||||||||||||||||||||||
▲ | duskwuff 4 days ago | parent | next [-] | |||||||||||||||||||||||||
> 2. Actually draw the ascii board for each step. I doubt that this is going to make much difference. 2D "graphics" like ASCII art are foreign to language models - the models perceive text as a stream of tokens (including newlines), so "vertical" relationships between lines of text aren't obvious to them like they would be to a human viewer. Having that board diagram in the context window isn't likely to help the model reason about the game. Having the model list out the positions of each piece on the board in plain text (e.g. "Black knight at c5") might be a more suitable way to reinforce the model's positional awareness. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
▲ | tedsanders 4 days ago | parent | prev | next [-] | |||||||||||||||||||||||||
Chain of thought helps with many problems, but it actually tanks GPT’s chess performance. The regurgitation trick was the best (non-fine tuning) technique in my own chess experiments 1.5 years ago. | ||||||||||||||||||||||||||
▲ | TeMPOraL 4 days ago | parent | prev | next [-] | |||||||||||||||||||||||||
RE 2., I doubt it'll help - for at least two reasons, already mentioned by 'duskwuff and 'daveguy. RE 1., definitely worth trying, and there's more variants of such tricks specific to models. I'm out of date on OpenAI docs, but with Anthropic models, the docs suggest using XML notation to label and categorize most important parts of the input. This kind of soft structure seems to improve the results coming from Claude models; I imagine they specifically trained the model to recognize it. See: https://docs.anthropic.com/en/docs/build-with-claude/prompt-... In author's case, for Anthropic models, the final prompt could look like this:
This kind of prompting is supposed to provide noticeable improvement for Anthropic models. Ironically, I only discovered it few weeks ago, despite having been using Claude 3.5 Sonnet extensively for months. Which goes to say, RTFM is still a useful skill. Maybe OpenAI models have similar affordances too, simple but somehow unnoticed? (I'll re-check the docs myself later.) | ||||||||||||||||||||||||||
▲ | daveguy 4 days ago | parent | prev | next [-] | |||||||||||||||||||||||||
> Actually draw the ascii board for each step. The relative rarity of this representation in training data means it would probably degrade responses rather than improve them. I'd like to see the results of this, because I would be very surprised if it improved the responses. | ||||||||||||||||||||||||||
▲ | unoti 4 days ago | parent | prev | next [-] | |||||||||||||||||||||||||
I came here to basically say the same thing. The improvements the OP saw by asking it to repeat all the moves so far gives the LLM more time and space to think. I have this hypothesis giving it more time and space to think in other ways could improve performance even more, something like showing the current board position and asking it to perform an analysis of the position, list key challenges and strengths, asking it for a list of strategies possible from here, then asking it to select a strategy amongst the listed strategies, then asking it for its move. In general, asking it to really think rather than blurt out a move. The examples would be key here. These ideas were proven to work very well in the ReAct paper (and by extension, the CoT Chain of Thought paper). Could also extend this by asking it to do this N times and stop when we get the same answer a majority of times (this is an idea stolen from the CoT-SC paper, chain of through self-consistency). | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
▲ | ilaksh 4 days ago | parent | prev [-] | |||||||||||||||||||||||||
The fact that he hasn't tried this leads me to think that deep down he doesn't want the models to succeed and really just wants to make more charts. |