| ▲ | Show HN: Steerling-8B, a language model that can explain any token it generates(guidelabs.ai) | |||||||||||||||||||||||||
| 83 points by adebayoj 5 hours ago | 11 comments | ||||||||||||||||||||||||||
| ▲ | gormen 41 minutes ago | parent | next [-] | |||||||||||||||||||||||||
Most interpretability methods fail for LLMs because they try to explain outputs without modeling the intent, constraints, or internal structure that produced them. Token‑level attribution is useful, but without a framework for how the model reasons, you’re still explaining shadows on the wall. | ||||||||||||||||||||||||||
| ▲ | ottah 33 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||
It's a neat party trick, but explainability it's not solution to any AI safety issue I care about. It's a distraction from real problems, which is everything else around the model. The inflexible bureaucratic systems that make it hard to exercise rights and deflect accountability. | ||||||||||||||||||||||||||
| ▲ | umairnadeem123 26 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||
the practical value here is for regulated domains. in healthcare and finance you often cant deploy a model at all unless you can explain why it made a specific decision. token-level attribution that traces back to training data sources could satisfy audit requirements that currently block LLM adoption entirely. curious how the performance compares to a standard llama 8b on benchmarks - interpretability usually comes with a quality tax. | ||||||||||||||||||||||||||
| ▲ | brendanashworth 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
Is there a reason people don't use SHAP [1] to interpret language models more often? The in-context attribution of outputs seems very similar. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | pbmango 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
This is very interesting. I don't see much discussion of interpretability in day to the day discourse of AI builders. I wonder if everyone assumes it to either be solved, or to be too out of reach to bother stopping and thinking about. | ||||||||||||||||||||||||||
| ▲ | great_psy 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
Maybe I’m not creative enough to see the potential, but what value does this bring ? Given the example I saw about CRISPR, what does this model give over a different, non explaining model in the output ? Does it really make me more confident in the output if I know the data came from Arxiv or Wikipedia ? I find the LLM outputs are subtlety wrong not obviously wrong | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | rvz 2 hours ago | parent | prev [-] | |||||||||||||||||||||||||
Now this is something which is very interesting to see and might be the answer to the explainability issue with LLMs, which can unlock a lot more use-cases that are off limits. We'll see. | ||||||||||||||||||||||||||