▲ | freehorse a day ago | |
It is nonsense to take whatever an LLM writes in its CoT too seriously. I try to classify some messy data, writing "if X edge case appears, then do Y instead of Z". The model in its CoT took notice of X, wrote it should do Y and... it would not do it in the actual output. The only way to make actual use of LLMs imo is to treat them as what they are, a model that generates text based on some statistical regularities, without any kind of actual understanding or concepts behind that. If that is understood well, one can know how to setup things in order to optimise for desired output (or "alignment"). The way "alignment research" presents models as if they are actually thinking or have intentions of their own (hence the choice of the word "alignment" for this) makes no sense. |