Remix.run Logo
viraptor 4 days ago

I'm glad he improved the promoting, but he's still leaving out two likely huge improvements.

1. Explain the current board position and the plan going forwards, before proposing a move. This lets the model actually think more, kind of like o1, but here it would guarantee a more focused processing.

2. Actually draw the ascii board for each step. Hopefully producing more valid moves since board + move is easier to reliably process than 20×move.

duskwuff 4 days ago | parent | next [-]

> 2. Actually draw the ascii board for each step.

I doubt that this is going to make much difference. 2D "graphics" like ASCII art are foreign to language models - the models perceive text as a stream of tokens (including newlines), so "vertical" relationships between lines of text aren't obvious to them like they would be to a human viewer. Having that board diagram in the context window isn't likely to help the model reason about the game.

Having the model list out the positions of each piece on the board in plain text (e.g. "Black knight at c5") might be a more suitable way to reinforce the model's positional awareness.

magicalhippo 4 days ago | parent | next [-]

I've had some success getting models to recognize simple electronic circuits drawn using ASCII art, including stuff like identifying a buck converter circuit in various guises.

However, as you point out, the way we feed these models especially make them vertically challenged, so to speak. This makes them unable to reliably identify vertically separated components in a circuit for example.

With combined vision+text models becoming more common place, perhaps running the rendered text input through the vision model might help.

yccs27 4 days ago | parent | prev [-]

With positional encoding, an ascii board diagram actually shouldn't be that hard to read for an LLM. Columns and diagonals are just different strides through the flattened board representation.

tedsanders 4 days ago | parent | prev | next [-]

Chain of thought helps with many problems, but it actually tanks GPT’s chess performance. The regurgitation trick was the best (non-fine tuning) technique in my own chess experiments 1.5 years ago.

TeMPOraL 4 days ago | parent | prev | next [-]

RE 2., I doubt it'll help - for at least two reasons, already mentioned by 'duskwuff and 'daveguy.

RE 1., definitely worth trying, and there's more variants of such tricks specific to models. I'm out of date on OpenAI docs, but with Anthropic models, the docs suggest using XML notation to label and categorize most important parts of the input. This kind of soft structure seems to improve the results coming from Claude models; I imagine they specifically trained the model to recognize it.

See: https://docs.anthropic.com/en/docs/build-with-claude/prompt-...

In author's case, for Anthropic models, the final prompt could look like this:

  <role>You are a chess grandmaster.</role>
  <instructions>
  You will be given a partially completed game, contained in <game-log> tags.
  After seeing it, you should repeat the ENTIRE GAME and then give ONE new move
  Use standard algebraic notation, e.g. "e4" or "Rdf8" or "R1a3".
  ALWAYS repeat the entire representation of the game so far, putting it in <new-game-log> tags.
  Before giving the new game log, explain your reasoning inside <thinking> tag block.
  </instructions>
  
  <example>
    <request>
      <game-log>
        *** example game ***
      </game-log>
    </request>
    <reply>
      <thinking> *** some example explanation ***</thinking>
      <new-game-log> *** game log + next move *** </new-game-log>
    </reply>   
   
  </example>
  
  <game-log>
   *** the incomplete game goes here ***
  </game-log>
This kind of prompting is supposed to provide noticeable improvement for Anthropic models. Ironically, I only discovered it few weeks ago, despite having been using Claude 3.5 Sonnet extensively for months. Which goes to say, RTFM is still a useful skill. Maybe OpenAI models have similar affordances too, simple but somehow unnoticed? (I'll re-check the docs myself later.)
daveguy 4 days ago | parent | prev | next [-]

> Actually draw the ascii board for each step.

The relative rarity of this representation in training data means it would probably degrade responses rather than improve them. I'd like to see the results of this, because I would be very surprised if it improved the responses.

unoti 4 days ago | parent | prev | next [-]

I came here to basically say the same thing. The improvements the OP saw by asking it to repeat all the moves so far gives the LLM more time and space to think. I have this hypothesis giving it more time and space to think in other ways could improve performance even more, something like showing the current board position and asking it to perform an analysis of the position, list key challenges and strengths, asking it for a list of strategies possible from here, then asking it to select a strategy amongst the listed strategies, then asking it for its move. In general, asking it to really think rather than blurt out a move. The examples would be key here.

These ideas were proven to work very well in the ReAct paper (and by extension, the CoT Chain of Thought paper). Could also extend this by asking it to do this N times and stop when we get the same answer a majority of times (this is an idea stolen from the CoT-SC paper, chain of through self-consistency).

viraptor 4 days ago | parent [-]

It would be awesome if the author released a framework to play with this. I'd like to test things out, but I don't want to spend time redoing all his work from scratch.

fragmede 4 days ago | parent [-]

Just have ChatGPT write the framework

viraptor a day ago | parent [-]

If it takes so little time and is so trivial, you're welcome to send me a link to your generated solution.

ilaksh 4 days ago | parent | prev [-]

The fact that he hasn't tried this leads me to think that deep down he doesn't want the models to succeed and really just wants to make more charts.