Any reason for using a transformer architecture? You look at https://olmocr.allenai.org/ which does the best handwritten-to-latex in my opinion also does use VLM.
Also maybe xD not use LLMs to generate your HN description.