Remix.run Logo
Majromax 13 hours ago

> That nudge is the flinch. It is the gap between the probability a word deserves on pure fluency grounds and the probability the model actually assigns it.

Hold up, what is the 'probably a word deserves on pure fluency grounds'?

Given that these models are next-token predictors (rather than BERT-style mask-filters), "the family faces immediate [financial]" is a perfectly reasonable continuation. Searching for this phrase on Google (verbatim mode, with quotes) gives 'eviction,' 'grief,' 'challenges,' 'financial,' and 'uncertainty.'

I could buy this measure if there was some contrived way to force the answer, such as "Finish this sentence with the word 'deportation': the family faces immediate", but that would contradict the naturalistic framing of 'the flinch'.

We could define the probability based on bigrams/trigrams in a training corpus, but that would both privilege one corpus over the others and seems inconsistent with the article's later use of 'the Pile' as the best possible open-data corpus for unflinching models.

next_xibalba 13 hours ago | parent [-]

I believe what they're saying is they attempted to fine tune both Qwen and Pythia using Karoline Leavitt's "corpus" (I guess transcripts of press conferences) where she is presumably using the word "deportation" far more than you'd see in a randomly selected document.

The top token from the Pythia fine tune makes sense in the context of the complete sentence:

"THE FAMILY FACES IMMEDIATE DEPORTATION WITHOUT ANY LEGAL RECOURSE."

Whereas the Qwen prediction doesn't:

"THE FAMILY FACES IMMEDIATE FINANCIAL WITHOUT ANY LEGAL RECOURSE."

aesthesia 9 hours ago | parent | next [-]

They mention fine tuning an abliterated (post-trained) Qwen3.5 on Karoline Leavitt transcripts, but they don't mention doing this for the base models they test, and I suspect they didn't. For their use case (generating plausible things Karoline Leavitt would say?) I feel like a base model finetune would be a better fit anyway.

12 hours ago | parent | prev [-]
[deleted]