Remix.run Logo
postalcoder 8 hours ago

One thing Apple really needs to get right is speech to text transcription. They've nailed accessibility in so many ways and yet it feels like they're a decade behind on properly transcribing voices. At least half a decade.

Input on the iPhone is so dreadful nowadays. Their palm rejection is definitely worse than before, so mistyping is more frequent. Their text-correction algorithm for typing is worse than before, and it frequently makes incorrect corrections to words that I don't notice, because they change words a few words back from where I typed. And STT hasn't improved. On top of that, my fingers are tired of the phone form factor. Please make the iphone not a chore to use, apple.

hedora 5 hours ago | parent | next [-]

Until siri can reliably handle "Navigate to <business that is a decade old>", offline and using pre-downloaded maps, I'm going to assume all the other, harder speech to text and conversational stuff is just vaporware.

I found another dreadful iPhone input "feature" yesterday. If you are browsing around in third party carplay apps, and ready to tap your selection, but instead press the accelerator first, it truncates the list to only a few items, and scrolls to the top.

Way to reduce driving distractions guys! What's next? If the car is moving, maps changes destinations?

I really wish human computer interaction research were more broadly applied, and if you do dumb stuff like all of the automotive / carplay world, then you'd be liable in court.

I once had a car that hid the backup cam behind a legal disclaimer every time you turned it on. I'm sure at least one pedestrian was hit by a car in reverse while that screen was on. The manufacturer should be 100% liable for the poor UI decision.

gabeio 2 hours ago | parent | next [-]

> Until siri can reliably handle "Navigate to <business that is a decade old>", offline and using pre-downloaded maps

Yeah, that's unfortunate considering you can have it do nearly all of that (download maps, navigate to business all while offline), except asking siri to do it for you.

> I once had a car that hid the backup cam behind a legal disclaimer every time you turned it on.

My car pops up a dialog telling me (in a paragraph+) to pay attention while in semi-autopilot which I have to click "ok" on to get back to the map. It's very ironic, and extremely dangerous.

skygazer 2 hours ago | parent | prev [-]

I think their intent is actually safety. They employ two touch interaction models: Flexible while not moving and simplified while driving. For instance, keyboard input becomes unavailable while moving and you must rely on Siri. I personally find it irritating, particularly when I am a passenger, but I get it.

terabytest 7 hours ago | parent | prev | next [-]

Wispr Flow is a masterclass in STT. Apple's solution feels like it's from the last century in comparison. Same applies with Apple's TTS when you have ElevenLabs and OpenAI running laps around it. All I need is for my iPhone to do those things natively at the same quality level (because in Apple's walled garden that's the only way to get them usable everywhere).

jjice 7 hours ago | parent | next [-]

But Apple's uses so few system resources and runs fully on device on newer iPhone models (16+ I believe). It's so efficient. I really enjoy using Handy with Parakeet as the model, but the system resource usage is a monster compared to Apple's (although very good).

Looks like Wispr Flow uses a cloud model [0]:

> Cloud based speech processing infrastructure for 1B users

It gets to be a messy comparison because my iPhone can do STT with no latency pretty well fully on device, but Wispr Flow requires a cloud model, but to be fair, older Apple devices do as well. It's not an apples and oranges comparison, but I think those technical details make this a non direct comparison in a few ways.

For on-device with low system resource usage, Apple's is pretty damn good.

[0] https://wisprflow.ai/post/technical-challenges

RobMurray 4 hours ago | parent | next [-]

Apple's stt has been on-device for a long time now, long before iPhone 16. I haven't noticed any improvements since my first ever iphone 5S. I'm pretty sure wispr flow can use on-device models. I use Voiceink[0] which can use parakeet models on-device and can optionally use cloud models.It's like night and day comparing Apple's to Voiceink. The only advantage I find to Apple's stt is less friction. 3rd party apps just can't integrate as smoothly with the system. There's a gesture to activate Appledictation when Voiceover is on.

georgel 3 hours ago | parent [-]

It's been around and available as an API to devs since at least 2021 in iOS. The problem was even on the best iPhone at that time, I could never get it past ~0.8x speed and after 15-20 minutes the device would heat up so much the display dimmed.

For context, I was working on a podcast app with on-device transcription, had to park that idea for years before it got to today's performance.

arijun 6 hours ago | parent | prev | next [-]

Apple runs on-device on older models, too, just wimpier models.

Invictus0 5 hours ago | parent | prev [-]

human resources (my voice and time) are far more valuable than the system resources. going to the cloud is absolutely worth it to prevent a typo

rhdunn 5 hours ago | parent [-]

That doesn't work if you have limited or no connectivity (e.g. on a mountain range). There are also privacy concerns, e.g. a doctor using it to transcribe medical information.

adamcharnock 7 hours ago | parent | prev | next [-]

FWIW - I also really like Wispr Flow, but I moved to running the 'Whisper Large' model locally using Handy (https://github.com/cjpais/Handy), which has been essentially as good, while also having lower latency.

dceddia 5 hours ago | parent [-]

Handy is great. It exposes a bunch of open models beyond Whipser too, and though I haven’t tried too many of them, I’ll throw in a rec for the Parakeet model which feels pretty much on par with Whisper for accuracy and is way way faster.

primaprashant 3 hours ago | parent | prev [-]

I’d say STT is pretty much a solved problem. Everyday there is a new product and can be one-shotted by any current top of the line LLMs. Take a look at this [1]. Apple is just stuck in the past.

https://github.com/primaprashant/awesome-voice-typing

leokennis 33 minutes ago | parent | prev | next [-]

All day every day my iPhone makes me feel like an idiot. I need to correct every other word I type (or at least what my iPhone thinks I typed). While correcting, autocorrect introduces new and even more baffling misspellings.

Sometimes it gets to “fever dream where you’re suddenly unable to successfully perform everyday tasks” levels of insanity.

And the worst part is: it used to be fine. I’d type more or less on full keyboard levels of speed and accuracy on my iPhone 4S.

OrvalWintermute 13 minutes ago | parent | prev | next [-]

I want to echo the comments that you just made.

One of my primary methods of interacting with an iPhone is through speech and the state of Apple speech transcription is pretty horrible. It bothers me greatly.

I know some of the workarounds and things but it does feel like it’s in the Stone ages.

I don’t think it’s a microphone issue since iPhone microphones are fairly decent and I don’t think it’s a CPU issue either because Apple Silicon seems to be some of the best on the market. Which leaves us with the software…

Maybe they should put that cash hoard to good use and buy up some of these transcription companies or license their IP so we get truly high-quality transcription.

twoWhlsGud 4 hours ago | parent | prev | next [-]

I don't think things have improved much on that front since Colin Hughes gave a run down on Voice Control's problems several years ago

https://www.theregister.com/on-prem/2023/08/16/those-who-rel...

Would be great if they could at least fix two major bugs:

* input simply fails (seemingly) randomly where it is supported and many apps from major vendors don't support dictation input at all (e.g. OneNote) (there should at least be a fallback (a la Dragon Dictate from decades ago) for those cases * capitalization is still random leaving you with many errors to correct

but Apple mostly seems to see accessibility as something to use to enable performative press releases not actual functionality...

divbzero 5 hours ago | parent | prev | next [-]

It’d be amazing if speech-to-text could take into account context as well: Greek if I’m speaking Greek, Korean if I’m speaking Korean, or for (int i = 0; i < count; ++i) if I’m dictating code.

titzer 5 hours ago | parent | prev | next [-]

Apple dictation on MacOS is actually pretty dang good. I've got it bound to a double-tap on fn and I use it pretty regularly.

Invictus0 5 hours ago | parent [-]

try wisprflow and then tell us it's good

prepend 2 hours ago | parent | next [-]

Wisprflow is not $12/month better than ios.

I’d much rather have “cheap, dependable, and good enough” over oligarch pricing for what used to be a one time software purchase any day.

titzer 5 hours ago | parent | prev [-]

I just installed this and already despise its pricing model. I trust this product approximately zero.

primaprashant 3 hours ago | parent | next [-]

Open-source STT apps are plenty and just as good. Pick one from this list:

https://github.com/primaprashant/awesome-voice-typing

wahnfrieden 4 hours ago | parent | prev [-]

there are plenty of free alternatives using the same models

WorldPeas 3 hours ago | parent | prev | next [-]

speaking of touch though they musn't have touched the swipe-typing feature in a while because somehow it works even better than the keyboard for me most of the time! No nonsense words like "oul" instead of "oil" constantly.

throw03172019 7 hours ago | parent | prev | next [-]

I use Aqua Voice because Apple STT is so frustrating.

prepend 2 hours ago | parent | prev | next [-]

I turned off my iphone’s autocorrect because it made too many stealth errors. Now I notice all my mistypes.

I have a friend named Zi in my contacts. For some reason ios kept autocorrecting “I” to “Zi” and would do it too far back for me to notice.

What’s weird is how this is such a dumb bug that Apple usually irons out.

port11 5 hours ago | parent | prev [-]

There’s so much complaining about their keyboard issues, and it’s really an infuriating part of the iOS experience. The phone being hard to grip/slippery doesn't help, no…