its not an image and audio model, so I believe it wouldn't work for me by itself
would probably need multiple models running in distinct containers, with another process coordinating them