I wonder how well AI audio generation would work here, to produce a voiceover video like the original input.