| ▲ | cyanydeez 6 hours ago | |
I think it's basically equal to End of Line when it comes to an LLM. It means they have nothing else to add, there's zero context for them to draw from, and they've exhausted the probability chain you've been following; but they're creating to generate 'next token' and positive renforcement is _how they are trained_ in many cases so the token of choice would naturally be how they're trained, since it's a probability engine but it doesn't know the difference between the instruction and the output. So, "great idea" is coming from the renforcement learning instruction rather than the answer portion of the generation. | ||