Memory is usually slow and haven't seen many voice agents atleast leverage it. Are you building in text modality or audio as well?