Remix.run Logo
NitpickLawyer 4 hours ago

> We also release an interactive frontend for exploring NLAs on several open models through a collaboration with Neuronpedia.

Whatever they did on LLama didn't work, nothing makes sense in their example where they ask the model to lie about 1+1. Either the model is too old, or whatever they used isn't working, but whatever the autoencoder outputs is nothing like their examples with claude. Gemma is similarly bad.

fredericoluz 4 hours ago | parent | next [-]

it seems that the examples they showed off with haiku work. i'd guess llama is just too bad

fredericoluz 4 hours ago | parent | prev [-]

same. i'm trying to trigger the 'mom is in the next room' russian thing but the model thinks the sentence is from american reddit.

zozbot234 2 hours ago | parent [-]

AIUI the paper's examples are from a version of Claude not Llama? The thinking process is going to be extremely model-specific.