The model giving it text to speak would have to annotate the text in order for the TTS to add the affect. The TTS wouldn't "remember" such instructions from a speech to text stage previously.