| ▲ | NitpickLawyer 4 hours ago | |||||||
> We also release an interactive frontend for exploring NLAs on several open models through a collaboration with Neuronpedia. Whatever they did on LLama didn't work, nothing makes sense in their example where they ask the model to lie about 1+1. Either the model is too old, or whatever they used isn't working, but whatever the autoencoder outputs is nothing like their examples with claude. Gemma is similarly bad. | ||||||||
| ▲ | fredericoluz 4 hours ago | parent | next [-] | |||||||
it seems that the examples they showed off with haiku work. i'd guess llama is just too bad | ||||||||
| ▲ | fredericoluz 4 hours ago | parent | prev [-] | |||||||
same. i'm trying to trigger the 'mom is in the next room' russian thing but the model thinks the sentence is from american reddit. | ||||||||
| ||||||||