I don't think LLMs are that chaotic, you can replace words in an input at get a similar answer, and they are very good at dealing with typos.

They are definitely not interpretable, I was reading some stuff from mechanistic interpretability researchers saying they've given up trying to build a bottom up model of how they work.

▲

mylifeandtimes 2 hours ago | parent [-]

> I don't think LLMs are that chaotic, you can replace words in an input at get a similar answer, and they are very good at dealing with typos.

Compare "You are a helpful assistant. Your task is to <100 lines of task description> <example problem>"

with

"you are a helpless assistant. Your task is to <100 lines of task description> <example problem>"

I've changed 3 or 4 CHARACTERS ("ful" to "less") out of a (by construction) 1000+ character prompt.

and the outputs are not at all similar.

Just realized I've never tried the "you are a helpless ass" prompt. Again a very minor change in wording, just dropping a few letters. The helpless assistant at least output text apologizing for being so bad at the task.

	▲	orbital-decay an hour ago \| parent [-]
		Sure. What did you expect? You changed the semantic of your prompt to the complete opposite. Of course it will attempt to make sense of it to its ability, and deliver what you requested. The input isn't formally specified, that's inherent for the domain, not the model or a human. GP, on the other hand, is talking about semantically negligible differences like typos.