Remix.run Logo
armchairhacker 10 days ago

Any suggestions from this literature?

libraryofbabel 10 days ago | parent [-]

The papers from Anthropic on interpretability are pretty good. They look at how certain concepts are encoded within the LLM.