| ▲ | lxe 2 hours ago | |
I built something similar for Linux (yapyap — push-to-talk with whisper.cpp). The "local is too slow" argument doesn't hold up anymore if you have any GPU at all. whisper large-v3-turbo with CUDA on an RTX card transcribes a full paragraph in under a second. Even on CPU, parakeet is near-instant for short utterances.The "deep context" feature is clever, but screenshotting and sending to a cloud LLM feels like massive overkill for fixing name spelling. The accessibility API approach someone mentioned upthread is the right call — grab the focused field's content, nearby labels, window title. That's a tiny text prompt a 3B local model handles in milliseconds. No screenshots, no cloud, no latency.The real question with Groq-dependent tools: what happens when the free tier goes away? We've seen this movie before. Building on local models is slower today but doesn't have a rug-pull failure mode. | ||
| ▲ | Wowfunhappy 12 minutes ago | parent [-] | |
> The "local is too slow" argument doesn't hold up anymore if you have any GPU at all. By "any GPU" you mean a physical, dedicated GPU card, right? That's not a small requirement, especially on Macs. | ||