| ▲ | My Calculator Is a Transformer(sinclairs.gitlab.io) | |
| 3 points by radarsat1 6 hours ago | 1 comments | ||
| ▲ | radarsat1 6 hours ago | parent [-] | |
Recently I got interested in how to "compile" a program definition into the weights of a Transformer. I settled for distilling the MLPs individually, but the attention weights are fully "calculated". The example program [1] generates a Transformer that executes an RPN expression, using "breadcrumb" tokens to track its progress. The output looks like:
I think there's still a lot that could be improved but I wanted to document what I have done so far. It turned out very interesting and made me think about transformers, attention and particularly the structure of the residual stream in a new way.[1]: https://github.com/radarsat1/rpn_transformer/blob/main/src/p... | ||