| ▲ | Steering interpretable language models with concept algebra(guidelabs.ai) | ||||||||||||||||
| 33 points by luulinh90s a day ago | 3 comments | |||||||||||||||||
| ▲ | giang_at_glai 16 hours ago | parent [-] | ||||||||||||||||
Author here. This post shows “concept algebra” on language model: inject, suppress, and compose human-understandable concepts at inference time (no retraining, no prompt engineering). There’s an interactive demo on the post. Would love feedback on: (1) what steering tasks you’d benchmark, (2) failure cases you’d want to see, (3) whether this kind of compositional control is useful in real products. | |||||||||||||||||
| |||||||||||||||||