| ▲ | bogtog 14 hours ago |
| Using voice transcription is nice for fully expressing what you want, so the model doesn't need to make guesses. I'm often voicing 500-word prompts. If you talk in a winding way that looks awkward when in text, that's fine. The model will almost certainly be able to tell what you mean. Using voice-to-text is my biggest suggestion for people who want to use AI for programming (I'm not a particularly slow typer. I can go 70-90 WPM on a typing test. However, this speed drops quickly when I need to also think about what I'm saying. Typing that fast is also kinda tiring, whereas talking/thinking at 100-120 WPM feels comfortable. In general, I think just this lowered friction makes me much more willing to fully describe what I want) You can also ask it, "do you have any questions?" I find that saying "if you have any questions, ask me, otherwise go ahead and build this" rarely produces questions for me. However, if I say "Make a plan and ask me any questions you may have" then it usually has a few questions I've also found a lot of success when I tell Claude Code to emulate on some specific piece of code I've previously written, either within the same project or something I've pasted in |
|
| ▲ | Marsymars 11 hours ago | parent | next [-] |
| > I'm not a particularly slow typer. I can go 70-90 WPM on a typing test. However, this speed drops quickly when I need to also think about what I'm saying. Typing that fast is also kinda tiring, whereas talking/thinking at 100-120 WPM feels comfortable. This doesn't feel relatable at all to me. If my writing speed is bottlenecked by thinking about what I'm writing, and my talking speed is significantly faster, that just means I've removed the bottleneck by not thinking about what I'm saying. |
| |
| ▲ | eucyclos 2 hours ago | parent | next [-] | | It's often better to segregate creative and inhibitive systems even if you need the inhibitive systems to produce a finished work. There's a (probably apocryphal) conversation between George RR Martin and Stephen King that goes something like: GRRM: How do you write so many books?... Don't you ever spend hours staring at the page, agonizing over which of two words to use, and asking 'am I actually any good at this?' SK: Of course! But not when I'm writing. | |
| ▲ | hexaga 11 hours ago | parent | prev | next [-] | | Alternatively: some people are just better at / more comfortable thinking in auditory mode than visual mode & vice versa. In principle I don't see why they should have different amounts of thought. That'd be bounded by how much time it takes to produce the message, I think. Typing permits backtracking via editing, but speaking permits 'semantic backtracking' which isn't equivalent but definitely can do similar things. Language is powerful. And importantly, to backtrack in visual media I tend to need to re-saccade through the text with physical eye motions, whereas with audio my brain just has an internal buffer I know at the speed of thought. Typed messages might have higher _density_ of thought per token, though how valuable is that really, in LLM contexts? There are diminishing returns on how perfect you can get a prompt. Also, audio permits a higher bandwidth mode: one can scan and speak at the same time. | |
| ▲ | bogtog 10 hours ago | parent | prev | next [-] | | That's fair. I sometimes find myself pausing or just talking in circles as I'm deciding what I want. I think when I'm speaking, I feel freer to use less precise/formal descriptions, but the model can still correctly interpret the technical meaning In either case, different strokes for different folks, and what ultimately matters is whether you get good results. I think the upside is high, so I broadly suggest people try it out | |
| ▲ | buu700 8 hours ago | parent | prev | next [-] | | I prefer writing myself, but I could see the appeal of producing a first draft of a prompt by dumping a verbal stream of consciousness into ChatGPT. That might actually be kind of fun to try while going on a walk or something. | |
| ▲ | dyauspitr 10 hours ago | parent | prev [-] | | I don’t feel restricted by my typing speed, speaking is just so much easier and convenient. The vast majority of my ChatGPT usage is on my phone and that makes s2t a no brainer. |
|
|
| ▲ | cjflog 3 hours ago | parent | prev | next [-] |
| 100% this, I built laboratory.love almost entirely with my voice and (now-outdated) Claude models My go-to prompt finisher, which I have mapped to a hotkey due to frequent use, is "Before writing any code, first analyze the problem and requirements and identify any ambiguities, contradictions, or issues. Ask me to clarify any questions you have, and then we'll proceed to writing the code" |
|
| ▲ | 14 hours ago | parent | prev | next [-] |
| [deleted] |
|
| ▲ | journal 8 hours ago | parent | prev | next [-] |
| voice transcription is silly when someone is listening you talking to something that isn't exactly human, imagine explaining you were talking to AI. When it's more than one sentence I use voice too. |
|
| ▲ | johnfn 14 hours ago | parent | prev | next [-] |
| That's a fun idea. How do you get the transcript into Claude Code (or whatever you use)? What transcription service do you use? |
| |
| ▲ | hn_throw2025 14 hours ago | parent | next [-] | | I'm not the person you're replying to, but I use Whispering connected to the whisper-large-v3-turbo model on Groq. It's incredibly cheap and works reliably for me. I have got it to paste my voice transcriptions into Chrome (Gemini, Claude, ChatGPT) as well as Cursor. https://github.com/EpicenterHQ/epicenter | |
| ▲ | rgbrgb 12 hours ago | parent | prev | next [-] | | I use Handy with Claude code. Nice to just have a key combo to transcribe into whatever has focus. https://github.com/cjpais/Handy | |
| ▲ | quinncom 13 hours ago | parent | prev | next [-] | | I use Spokenly with local Parakeet 0.6B v3 model + Cerebras gpt-oss-120b for post-processing (cleaning up transcription errors and fixing technical mondegreens, e.g., `no JS` → `Node.js`). Almost imperceptible transcription and processing delay. Trigger transcription with right ⌥ key. | | |
| ▲ | ctoth 11 hours ago | parent [-] | | According to Google this is the first time the phrase "technical mondegreens" was ever used. I really like it. |
| |
| ▲ | hurturue 13 hours ago | parent | prev | next [-] | | your OS might have a built in dictation thing. Google for that and try it before online services. | |
| ▲ | singhrac 4 hours ago | parent | prev | next [-] | | I use VoiceInk (needed some patches to get it to compile but Claude figured it out) and the Parakeet V3 model. It’s really good! | |
| ▲ | bogtog 13 hours ago | parent | prev | next [-] | | There are a few apps nowadays for voice transcription. I've used Wispr Flow and Superwhisper, and both seem good. You can map some hotkey (e.g., ctrl + windows) to start recording, then when you press it again to stop, it'll get pasted into whatever text box you have open Superwhisper offers some AI post-processing of the text (e.g., making nice bullets or grammar), but this doesn't seem necessary and just makes things a bit slower | |
| ▲ | elvin_d 10 hours ago | parent | prev [-] | | made this tool to press double control to start and another ctrl to stop which copies to the cliboard https://github.com/elv1n/para-speak/ |
|
|
| ▲ | listic 14 hours ago | parent | prev | next [-] |
| Thanks for the advice! Could you please share how did you enable voice transcription for your setup and what it actually is? |
| |
| ▲ | binocarlos 13 hours ago | parent | next [-] | | I use https://github.com/braden-w/whispering with an OpenAI api key. I use a keyboard shortcut to start and stop recording and it will put the transcription into the clipboard so I can paste into any app. It's a huge productivity boost - OP is correct about not overthinking trying to be that coherent - the models are very good at knowing what you mean (Opus 4.5 with Claude Code in my case) | | |
| ▲ | abdullahkhalids 9 hours ago | parent | next [-] | | I just installed this app and it is very nice. The UX is very clean and whatever I say it transcribes it correctly. In fact I'm transcribing this comment with this app just now. I am using Whisper Medium. The only problem I see is that at the end of the message it sometimes puts a bye or a thank you which is kind of annoying. | |
| ▲ | listic 12 hours ago | parent | prev [-] | | I am all ready to believe that with LLMs it's not worth it trying to be too coherent: I did successfully use LLMs to make sense of what incoherent-sounding people say. (in text) |
| |
| ▲ | bogtog 13 hours ago | parent | prev | next [-] | | I'm using Wispr flow, but I've also tried Superwhisper. Both are fine. I have a convenient hotkey to start/end recording with one hand. Having it just need one hand is nice. I'm using this with the Claude Code vscode extension in Cursor. If you go down this route, the Claude Code instance should be moved into a separate window outside your main editor or else it'll flicker a lot | | |
| ▲ | pzo 5 hours ago | parent [-] | | another option is MacWhisper if someone is on macOS and doesn't want to pay for subscription (just one time payment) - pretty much all of those apps these days use paraspeech from NVIDIA which is the fastest and the best open source model that can run on edge devices. Also haven't tried but on latest MacOS 26 apple updated their STT models so their build in voice dictation maybe is good enough. |
| |
| ▲ | kapnap 13 hours ago | parent | prev | next [-] | | For me, on Mac, VoiceInk has been top notch. Got tired of superwhispr | |
| ▲ | lukax 10 hours ago | parent | prev [-] | | Spokenly on macOS with Soniox model. |
|
|
| ▲ | Applejinx 9 hours ago | parent | prev | next [-] |
| It's an AI. You might do better by phrasing it, 'Make a plan, and have questions'. There's nobody there, but if it's specifically directed to 'have questions' you might find they are good questions! Why are you asking, if you figure it'd be better to get questions? Just say to have questions, and it will. It's like a reasoning model. Don't ask, prompt 'and here is where you come up with apropos questions' and you shall have them, possibly even in a useful way. |
|
| ▲ | j45 10 hours ago | parent | prev | next [-] |
| Speech also uses a different part of the brain, and maybe less finger coordination. |
|
| ▲ | dominotw 13 hours ago | parent | prev [-] |
| surprised ai companies are not making this workflow possible instead of leaving it upto users to figure out how to get voice text into prompt. |
| |
| ▲ | alwillis 13 hours ago | parent | next [-] | | > surprised ai companies are not making this workflow possible instead of leaving it upto users to figure out how to get voice text into prompt. Claude on macOS and iOS have native voice to text transcription. Haven't tried it but since you can access Claude Code from the apps now, I wonder if you use the Claude app's transcription for input into Claude Code. | | |
| ▲ | bogtog 13 hours ago | parent [-] | | > Claude on macOS and iOS have native voice to text transcription Yeah, Claude/ChatGPT/Gemini all offer this, although Gemini's is basically unusable because it will immediately send the message if you stop talking for a few seconds I imagine you totally could use the app transcript and paste it in, but keeping the friction to an absolute minimum (e.g., just needing to press one hotkey) feels nice |
| |
| ▲ | dyauspitr 10 hours ago | parent | prev [-] | | All the mobile apps make this very easy. |
|