Need multimodal and body and fully online.
In the meantime strictly language audio and video will go pretty far