there's also a DeepSeek whitepaper on this technique https://www.seangoedecke.com/text-tokens-as-image-tokens