Remix.run Logo
the_af 5 days ago

> The purpose of the test is whatever the tester decides it is.

Well, I did say:

> As an experiment, I cannot argue with this.

It was more a reflection on the fact that the primary goal of a lot of modern IF games, among which there is "9:05", the first game mentioned in TFA, is not like "traversing a mountain". Traversing a mountain can have clear and meaningful goals, such us "reach the summit", or "avoid getting stuck", or "do not die or go missing after X hours". Though of course, appreciating nature and sightseeing is beyond the scope of an LLM.

Indeed, "9:05" has no other "goal" than, upon seeing a different ending from the main one, revisiting the game with the knowledge gained from that first playthrough. I'm being purposefully opaque in order not to spoil the game for you (you should play it, it's really short).

Let me put it another way: remember that fad, some years ago, of making you pay attention to an image or video, with a prompt like "colorblind people cannot see this shape after X seconds" so you pay attention and then BAM! A jump scare! Haha, joke's on you!

How would you "test" a LLM on such jump scare? The goal is to scare a human. LLMs cannot be scared. What would the possible answers be?

A: I do not see any disappearing shapes after X seconds. Beep boop! I must not be colorblind, nor human, for I am an LLM. Beep!

or maybe

B: This is a well-known joke. Beep boop! After some short time, a monster appears on screen. This is intended to scare the person looking at it! Beep!

Would you say either response would show the LLM "playing" the game?

(Trust me, this is a somewhat adjacent effect to what "9:05" would play on you, and I fear I've said too much!)