| ▲ | scosman 2 hours ago | |
I had the same experience so started building my own. All problems are solvable, just working on the polish. - crash recovery: part one is use ADTS aac (even if process crashes, audio is saved up until it does). Part two is isolating the transcription/summaries in separate XPC services. - disk space: AAC 64kbps mono soles it. Could use Opus for further reduction but both are small. - speaker bleed: macOS voice isolation processing solves this. It’s a nightmare to get setup, but works great once done. - library: using argmax SDK - by a bunch of ex-Apple on device AI folks. It it wasn’t for CoreAudio, I’d say it was easy to make. Argmax, Whisper, and llama.cpp - wrapped in the right architecture, mostly just work. I’m having fun nerding out on the details like custom vocabulary (get the names of the people in here meeting right), inferring speaker names from transcript, calendar integration, nice UI, etc. | ||