It can be solved with speaker segmentation/embedding models, although it is not perfect. One thing we do with Hyprnote is that we have a Descript-like transcript editor that allows you to easily edit/assign speakers. Once we integrate a speaker diarization model with that, I think we'll be in good shape.

If you are interested, you can join our Discord and follow updates. :) https://hyprnote.com/discord

▲

mijoharas 8 days ago | parent | next [-]

Oh awesome, I was reading through to see about whether it had speaker diarization (why I got rid of my whisper script I use).

I'll look forward to the Linux version.

Is there any chance of a headless mode? (I.e. start, and write transcript to stdout with some light speaker diarization markup. e.g. "Speaker1: text")

	▲	yujonglee 8 days ago \| parent [-]
		> Is there any chance of a headless mode? maybe. we might be able to add extension system that each extension can have that info and do whatever it want within the app. > I'll look forward to the Linux version. https://github.com/fastrepl/hyprnote/issues/67 We have open issue. You might want to subscribe to it!

▲

apwell23 9 days ago | parent | prev [-]

our conference rooms even have some sort of rotating camera contraption that automatically focus on the person speaking