| ▲ | visarga 5 hours ago | |
Beautiful idea, an autoencoder must represent everything without hiding if is to recover the original data closely. So it trains a model to verbalize embeddings well. This reveals what we want to know about the model (such as when it thinks it is being tested, or other hidden thoughts). | ||
| ▲ | sobellian 3 hours ago | parent [-] | |
It could just invent its own secret language embedded into English akin to steganography. The explanation would not lose information but would remain uninterpretable by humans | ||