I know they define "achievements" in order to measure "how well" the LLM plays the game, and by definition this is arbitrary. As an experiment, I cannot argue with this.

However, I must point out the kind of "modern" (relatively speaking) adventure games mentioned in the article -- which are more accurately called "interactive fiction" by the community -- is not very suitable for this kind of experiment. Why? Well, because so many of them are exploratory/experimental, and not at all about "winning" (unlike, say, "Colossal Cave Adventure", where there is a clear goal).

You cannot automate (via LLM) "playing" them, because they are all about the thoughts and emotions (and maybe shocked laughter) they elicit in human players. This cannot be automated.

If you think I'm being snobby, consider this: the first game TFA mentions is "9:05". Now, you can set goals for a bot to play this game, but truly -- if you've played the game -- you know this would be completely missing the point. You cannot "win" this game, it's all about subverting expectations, and about replaying it once you've seen the first, most straightforward ending, and having a laugh about it.

Saying more will spoil the game :)

(And do note there's no such thing as "spoiling a game" for an LLM, which is precisely the reason they cannot truly "play" these games!)

▲

Terr_ 5 days ago | parent | next [-]

That's like saying it's wrong to test a robot's ability to navigate and traverse a mountain... because the mountain has no win-condition and is really a context for human emotional experiences.

The purpose of the test is whatever the tester decides it is. If that means finding X% of the ambiguously-good game endings within a budget of Y commands, then so be it.

	▲	the_af 5 days ago \| parent [-]
		> The purpose of the test is whatever the tester decides it is. Well, I did say: > As an experiment, I cannot argue with this. It was more a reflection on the fact that the primary goal of a lot of modern IF games, among which there is "9:05", the first game mentioned in TFA, is not like "traversing a mountain". Traversing a mountain can have clear and meaningful goals, such us "reach the summit", or "avoid getting stuck", or "do not die or go missing after X hours". Though of course, appreciating nature and sightseeing is beyond the scope of an LLM. Indeed, "9:05" has no other "goal" than, upon seeing a different ending from the main one, revisiting the game with the knowledge gained from that first playthrough. I'm being purposefully opaque in order not to spoil the game for you (you should play it, it's really short). Let me put it another way: remember that fad, some years ago, of making you pay attention to an image or video, with a prompt like "colorblind people cannot see this shape after X seconds" so you pay attention and then BAM! A jump scare! Haha, joke's on you! How would you "test" a LLM on such jump scare? The goal is to scare a human. LLMs cannot be scared. What would the possible answers be? A: I do not see any disappearing shapes after X seconds. Beep boop! I must not be colorblind, nor human, for I am an LLM. Beep! or maybe B: This is a well-known joke. Beep boop! After some short time, a monster appears on screen. This is intended to scare the person looking at it! Beep! Would you say either response would show the LLM "playing" the game? (Trust me, this is a somewhat adjacent effect to what "9:05" would play on you, and I fear I've said too much!)

▲

fmbb 5 days ago | parent | prev | next [-]

Of course you can automate ”having fun” and ”being entertained”. That is if you believe humanity will ever build artificial intelligence.

	▲	the_af 5 days ago \| parent \| next [-]
		> Of course you can automate ”having fun” and ”being entertained” This seems like begging the question to me. I don't think there's a mechanistic (as in "token predictor") procedure to generate the emotions of having fun, or being surprised, or amazed. It's not on me to demonstrate it cannot be done, it's on them to demonstrate it can. But to be clear, I don't think the author of TFA is making this claim either. They are simply approaching IF games from a "problem solving" perspective -- they don't claim this has anything to do with fun or AGI -- and what I'm arguing is that this mechanistic approach to IF games, i.e. "problem solving", only touches on a small subset of what makes people want to play these games. They are often (not all, as the author rightly corrects me, but often) about generating surprise and amazement in the player, something that cannot be done to an LLM. (Note I'm also not dismissing the author's experiment. As an experiment it's interesting and, I'd argue, fun for the author). Current, state of the art LLMs cannot feel amazement, or nothing else really (and, I argue, no LLM in the current tech branch will ever can). I hope this isn't a controversial statement.
	▲	drdeca 5 days ago \| parent \| prev [-]
		A p-zombie would not have fun or be entertained, only act like it does. I don’t think AGI requires being unlike a p-zombie in this way.

▲

kqr 5 days ago | parent | prev [-]

I disagree. Lockout, Dreamhold, Lost Pig, and So Far are new games but in the old style. Plundered Hearts is literally one of the old games (though ahead of its time).

I'll grant you that 9:05 and For a Change are somewhat more modern: the former has easy puzzles, the latter very abstract puzzles.

I disagree new text adventures are not about puzzles and winning. They come in all kinds of flavours these days. Even games like 9:05 pace their narrative with traditional puzzles, meaning we can measure forward progress just the same. And to be fair, LLMs are so bad at these games that in these articles, I'm merely trying to get them to navigate the world at all.

If anything, I'd argue Adventure is a bad example of the genre you refer to. It was (by design) more of a caving simulator/sandbox with optinal loot than a game with progress toward a goal.

▲

dfan 5 days ago | parent | next [-]

As the author of For A Change, I am astonished that anyone would think it was a good testbed for an LLM text adventure solver. It's fun that they tried, though.

▲

kqr 5 days ago | parent [-]

Thank you for making it. The imagery of it is striking and comes back to me every now and then. I cannot unhear "a high wall is not high to be measured in units of length, but of angle" -- beautifully put.

The idea was that it'd be good example of having to navigate somewhat foreign but internally consistent worlds, an essential text adventure skill.

	▲	dfan 5 days ago \| parent [-]
		Ha, I didn't realize that I was replying to the person who wrote the post! The audience I had in mind when writing it was people who were already quite experienced in playing interactive fiction and could then be challenged in a new way while bringing their old skills to bear. So it's sort of a second-level game in that respect (so is 9:05, in different ways, as someone else mentioned).

▲

the_af 5 days ago | parent | prev [-]

We will have to agree to disagree, if you'll allow me the cliche.

I didn't use Adventure as an example of IF, it belongs in the older "text adventure" genre. Which is why I thought it would be more fitting to test LLMs, since it's not about experiences but about maxing points.

I think there's nothing to "solve" that an LLM can solve about IF. This genre of games, in its modern expression, is about breaking boundaries and expectations, and making the player enjoy this. Sometimes the fun is simply seeing different endings and how they relate to each other. Since LLMs cannot experience joy or surprise, and can only mechanically navigate the game (maybe "explore all possible end states" is a goal?), they cannot "play" it. Before you object: I'm aware you didn't claim the LLMs are really playing the game!

But here's a test for your set of LLMs: how would they "win" at "Rematch"? This game is about repeatedly dying, understanding what's happening, and stringing together a single sentence that will break the cycle and win the game. Can any LLM do this, a straightforward puzzle? I'd be impressed!

	▲	kqr 5 days ago \| parent [-]
		I think I see what you mean and with these clarifications we are in agreement. There is a lot of modern works of interactive fiction that goes way beyond what the old text adventures did, and work even when judged as art or literature. I just haven't played much of it because I'm a fan of the old-style games. As for the specific question, they would progress at Rematch by figuring out ever more complicated interactions that work and will be used to survive, naturally.