Remix clone Hacker News

new | show | ask | jobs Github

	▲	resters 8 months ago
		FWIW I think a multimodal model could be trained to do extremely well with it given sufficient training data. A combination of textual description of the system and/or diagram, source code (mermaid, SVG, etc.) for the diagram, and the resulting image, with training to translate between all three.
	▲	bangaladore 8 months ago \| parent [-]
		Agreed. Even simply I'm sure a service like this already exists (or could easily exist) where the workflow is something like: 1. User provides information 2. LLM generates structured output for whatever modeling language 3. Same or other multimodal LLM reviews the generated graph for styling / positioning issues and ensure its matches user request. 4. LLM generates structured output based on the feedback. 5. etc... But you could probably fine-tune a multimodal model to do it in one shot, or way more effectively.