You should be able to first train it on generic text once, then duplicate the input layer and fine-tune on conversation.