All the frontier LLMs are multi-modal.
But voxel-art isn't one of their modes.
For other modes like image generation, the model has a dedicated image head, and is jointly trained.