My take on that is that it's a way to bring more relevant tokens in context, to influence the final answer. It's a bit like RAG but it's using training data instead!