More likely you would just train for emitting svg for some description of a scene and create training data from raster images.