Remix.run Logo
mmoskal 6 days ago

The way I understand it: if the instruction are at the top, the KV entries computed for "content" can be influenced by the instructions - the model can "focus" on what you're asking it to do and perform some computation, while it's "reading" the content. Otherwise, you're completely relaying on attention to find the information in the content, leaving it much less token space to "think".