Not sure if its "realtime" but the recently released VibeVoice-ASR from Microsoft does do diarization. https://huggingface.co/microsoft/VibeVoice-ASR