Remix.run Logo
Detrytus 3 days ago

I wonder if the problem here is simply hitting some internal quota on compute resources? Like, if you send the model on wild goose chase with irrelevant information it wastes enough compute time on it that it fails to arrive at correct answer to main question.

cantor_S_drug 3 days ago | parent [-]

Possibly. But could indicate that initial tokens set the direction or the path model could go down into. Just like when a person mentions two distinct topics in conversation nearby, the listener decides which topic to continue with.