A related test i did around the beginning of the year: i came up with a simple stack-oriented language and asked an LLM to solve a simple problem (calculate the squared distance between two points, the coordinates of which are already in the stack) and had it figure out the details.

The part i found neat was that i used a local LLM (some quantized version of QwQ from around December or so i think) that had a thinking mode so i was able to follow the thought process. Since it was running locally (and it wasn't a MoE model) it was slow enough for me to follow it in realtime and i found fun watching the LLM trying to understand the language.

One other interesting part is the language description had a mistake but the LLM managed to figure things out anyway.

Here is the transcript, including a simple C interpreter for the language and a test for it at the end with the code the LLM produced:

https://app.filen.io/#/d/28cb8e0d-627a-405f-b836-489e4682822...

▲

int_19h 7 hours ago | parent | next [-]

I often wonder how people can look at a log like this and still confidently state that this isn't reasoning.

▲

quinnjh 6 hours ago | parent | next [-]

It (thinking steps) has moments of brilliance, and generally convincing looking steps and improved outputs. Wether that is reasoning seems to be a matter of interpretation.

From skimming the log > After popping the 2, the stack is [X11, then pushing X2 would make it [X2, X1]? No, because pushing adds to the top. So after popping 2, the stack is [X1],then pushing X2 adds it on top → [X2, X1].

Wait, no, when you push, you add to the top. So after popping the 2, the stack is [X1], then pushing X2 would make it [X2, X1]? No, wait, the stack is LIFO. So pushing X2 would put it on top of X1 → stack becomes [X2, X1]? No, no. Wait, after popping the 2, the stack is [X1]. Then pushing X2 would make the stack [X2, X1]? No, no. Wait, when you push, the new element is added to the top. So after popping the 2 (so stack is [X1]), then pushing X2 gives [X2, X1]? No, no. Wait, the stack was [X1], then pushing X2 would make it [X2] on top of X1 → so stack is [X2, X1]? Yes, exactly.

▲

garciasn 6 hours ago | parent | prev [-]

Depends on the definition of reasoning:

1) think, understand, and form judgments by a process of logic.

—- LLMs do not think, nor do they understand; they also cannot form ‘judgments’ in any human-relatable way. They’re just providing results in the most statistically relevant way their training data permits.

2) find an answer to a problem by considering various possible solutions

—- LLMs can provide a result that may be an answer after providing various results that must be verified as accurate by a human, but they don’t do this in any human-relatable way either.

—-

So; while LLMs continue to be amazing mimics, thus they APPEAR to be great at ‘reasoning’, they aren’t doing anything of the sort, today.

▲

CamperBob2 5 hours ago | parent [-]

Exposure to our language is sufficient to teach the model how to form human-relatable judgements. The ability to execute tool calls and evaluate the results takes care of the rest. It's reasoning.

▲

garciasn 5 hours ago | parent [-]

SELECT next_word, likelihood_stat FROM context ORDER BY 2 DESC LIMIT 1

is not reasoning; it just appears that way due to Clarke’s third law.

▲

int_19h 4 hours ago | parent | next [-]

Sure, at the end of the day it selects the most probable token - but it has to compute the token probabilities first, and that's the part where it's hard to see how it could possibly produce a meaningful log like this without some form of reasoning (and a world model to base that reasoning on).

So, no, this doesn't actually answer the question in a meaningful way.

	▲	4 hours ago \| parent [-]
		[deleted]

▲

CamperBob2 5 hours ago | parent | prev [-]

(Shrug) You've already had to move your goalposts to the far corner of the parking garage down the street from the stadium. Argument from ignorance won't help.

▲

chrisweekly 13 hours ago | parent | prev [-]

THANK YOU for SHARING YOUR WORK!!

So many commenters claim to have done things w/ AI, but don't share the prompts. Cool experiment, cooler that you shared it properly.

▲

fsloth 12 hours ago | parent [-]

"but don't share the prompts."

To be honest I don't want to see anyone elses prompts generally because what works is so damn context sensitive - and seem to be so random what works and what not. Even though someone else had a brilliant prompt, there are no guarantees they work for me.

If working with something like Claude code, you tell it what you want. If it's not what you wanted, you delete everything, and add more specifications.

"Hey I would like to create a drawing app SPA in html that works like the old MS Paint".

If you have _no clue_ what to prompt, you can start by asking the prompt from the LLM or another LLM.

There are no manuals for these tools, and frankly they are irritatingly random in their capabilities. They are _good enough_ that I tend to always waste time trying to use them for every novell problem I came face with, and they work maybe 30% - 50% of time. And sometimes reach 100%.

▲

simonw 9 hours ago | parent [-]

"There are no manuals for these tools" is exactly why I like it when people share the prompts they used to achieve different things.

I try to share not just the prompts but the full conversation. This is easy with Claude and ChatGPT and Gemini - they have share links - but harder with coding agents.

I've recently started copying and pasting my entire Claude Code terminal sessions into a shareable HTML page, like this one: https://gistpreview.github.io/?de6b9a33591860aa73479cf106635... (context here: https://simonwillison.net/2025/Oct/28/github-universe-badge/) - I built this tool for doing that: https://tools.simonwillison.net/terminal-to-html

	▲	ciaranmca 7 hours ago \| parent [-]
		That’s why I like how OC handles sharing sessions https://opencode.ai/docs/share/ Wish other tools would copy this functionality(and maybe expand it so colleagues can pick up on sessions I share)