| ▲ | conorbergin 3 hours ago | |||||||
I don't think LLMs are that chaotic, you can replace words in an input at get a similar answer, and they are very good at dealing with typos. They are definitely not interpretable, I was reading some stuff from mechanistic interpretability researchers saying they've given up trying to build a bottom up model of how they work. | ||||||||
| ▲ | mylifeandtimes 2 hours ago | parent [-] | |||||||
> I don't think LLMs are that chaotic, you can replace words in an input at get a similar answer, and they are very good at dealing with typos. Compare "You are a helpful assistant. Your task is to <100 lines of task description> <example problem>" with "you are a helpless assistant. Your task is to <100 lines of task description> <example problem>" I've changed 3 or 4 CHARACTERS ("ful" to "less") out of a (by construction) 1000+ character prompt. and the outputs are not at all similar. Just realized I've never tried the "you are a helpless ass" prompt. Again a very minor change in wording, just dropping a few letters. The helpless assistant at least output text apologizing for being so bad at the task. | ||||||||
| ||||||||