| ▲ | blazingbanana 2 days ago |
| https://play.google.com/store/apps/details?id=com.blazingban... Completely free, no ads, no in-app purchases and no accounts / network required offline voice transcription. I have also built the macOS/Windows/Linux versions which I'll also make free to download and available on my site soon (https://blazingbanana.com/). iOS version is built and works (extremely well), just waiting for the Apple Developer signup process to complete. Big shout out to https://github.com/mybigday/whisper.rn and https://huggingface.co/ggerganov/whisper.cpp/tree/main for making this even possible. Any suggestions are welcome. |
|
| ▲ | bazzargh a day ago | parent | next [-] |
| On the subject of whisper being great... A few weeks ago a co-worker commented about the difficulty he'd had editing a work demo, I pointed at various jump-cutting tools that had automated what he did in the past (editing out silences). But I'd also wanted to play with whisper for a while... So a couple of hours later I'd written a script that does transcription based editing: on the first pass it grabs a timestamped transcript and a plain text transcript for editing; you edit the words into any order you like and a second pass reassembles the video (it's just a couple of hundred lines of python wrapping whisper and ffmpeg). It also speeds up 4x any silences detected that sit within retained sequences in the video. Matching up transcripts turns out to be not that hard; I normalise the text, split it, and then compare to the sequence of normalised words from the timestamped transcript. I find the longest common sequence, keep that, then recurse on the before/after sections (there's a little more detail, but not much). I also sent the transcription to ffmpeg to burn in as captions, because sometimes it makes the audio choppy and the captions make it easier to follow. I know, tools have been doing this for years now. I just didn't have one to hand, and now I do, and I couldn't have done this without whisper. |
| |
| ▲ | blazingbanana 21 hours ago | parent [-] | | That is absolutely awesome and I love hearing about the tools that people build themselves! Honestly, the capabilities of whisper is insane, the fact that it's free and open source is really a gift. Some of the things it can do feels almost sci-fi. If you ever decide to release it publicly please let me know, sounds like a very useful tool. |
|
|
| ▲ | seinecle 2 days ago | parent | prev | next [-] |
| Couldn't find it on the Play store by searching for the name and the developer's name: if it is not just me then your app is very hard to discover. So I am installing it through the link you provided, which directed me to a "install success" page saying "your purchase is successful" even if your app is free. Another obstacle to adoption :-) Last, I was not informed on the page of the app' size. Seeing what it does and the time it takes to download I am afraid it could be huge? Third obstacle :-) |
| |
| ▲ | blazingbanana 2 days ago | parent | next [-] | | Thank you for the feedback, I really do appreciate you taking the time to check it out and write out the comment! I'll look at adding a note about total app size in the description, it won't hurt. As for discoverability / the "your purchase is successful" message, I'm not sure what else I can do, I've set it to free, no ads etc in Google Play. Maybe I need to hit a few more keywords for transcription so it surfaces it more. | | |
| ▲ | thenthenthen a day ago | parent [-] | | The iOS Appstore also treats/words app installs as ‘Purchases’. Always confused my… |
| |
| ▲ | hurflmurfl 2 days ago | parent | prev [-] | | For me, searching for "whistle" on play store, I get the app as the third result (ignoring sponsored crap). Searching for "blazingbanana" gets me the app as the first result". App info shows 218MB size, which I suppose is about what I'd expect for a model+app code :shrug: | | |
| ▲ | blazingbanana 2 days ago | parent [-] | | Good to know, it's hard to know what real users would see in the play store and not Google just showing you what you want. Thank you for checking it out |
|
|
|
| ▲ | firefoxd 2 days ago | parent | prev | next [-] |
| Pretty cool. I've downloaded and lightly tested. Works great. I love the "free forever, no ads part..." But it obscures what the app is for. Maybe start with the "Speech to text transcription" to make it clearer. Either way, that's just semantics. Great job |
| |
| ▲ | blazingbanana a day ago | parent [-] | | Thank you, really appreciate the kind words. I'll take a look at giving the description a bit of a once over for the next release coming soon. |
|
|
| ▲ | figmert a day ago | parent | prev | next [-] |
| It'd be nice to keep the voice recording too, as I noticed at least one thing that it transcribed wrong. This way one can listen to the recording again, and correct such issues. |
| |
| ▲ | blazingbanana a day ago | parent [-] | | Great idea and an option I'm looking at implementing soon with the ability to reprocess with a different model if needed. Cheers for taking a look. | | |
| ▲ | figmert a day ago | parent [-] | | By the way, how does this handle conversations between two or more people? | | |
|
|
|
| ▲ | wosc a day ago | parent | prev | next [-] |
| That's very cool, I've been looking for a fully offline transcription app for quite a while. Thanks for building this! And thanks so much for providing an "import audio file" function, not just "record from mic" -- transcribing voice notes from various messenger apps is my main use case here. Do you have an idea about supporting languages other than English? |
| |
| ▲ | blazingbanana 15 hours ago | parent [-] | | Thank you, glad you like it! The average model and upwards should support all languages from the whisper models by default. I haven't tested them all so I'm unsure of the quality, however it should in theory support the following: --- Albanian Amharic Arabic Armenian Assamese Azerbaijani Bashkir Basque Belarusian Bengali Bosnian Breton Bulgarian Cantonese Catalan Chinese Croatian Czech Danish Dutch English Estonian Faroese Finnish French Galician Georgian German Greek Gujarati Haitian creole Hausa Hawaiian Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Javanese Kannada Kazakh Khmer Korean Lao Latin Latvian Lingala Lithuanian Luxembourgish Macedonian Malagasy Malay Malayalam Maltese Maori Marathi Mongolian Myanmar Nepali Norwegian Nynorsk Occitan Pashto Persian Polish Portuguese Punjabi Romanian Russian Sanskrit Serbian Shona Sindhi Sinhala Slovak Slovenian Somali Spanish Sundanese Swahili Swedish Tagalog Tajik Tamil Tatar Telugu Thai Tibetan Turkish Turkmen Ukrainian Urdu Uzbek Vietnamese Welsh Yiddish Yoruba --- Apologies for the formatting, not sure how to make it look nice in the comment. A new bugfix update for the "Translate to English" toggle (which was functionally always set to on) should be available soon, it's just awaiting Play Store approval. |
|
|
| ▲ | figmert a day ago | parent | prev | next [-] |
| I just tried running this on a 30 minute meeting with some 10 people in. It got to the end, then just bailed without transcribing. I also did not get any errors or anything. |
| |
| ▲ | blazingbanana a day ago | parent [-] | | Really sorry about that, longer running audio (~10m+) is something I'm working on along with handling multiple speakers. I've been focused on getting functional parity across all OS's since the Android release. This is very close to being done and I just need to reach the milestone of it being available on all platforms before I move forward. Hopefully you will take another look when the next update is out. |
|
|
| ▲ | buildcaptive a day ago | parent | prev | next [-] |
| @blazingbanana We have a similar product in the construction space. Would love to talk to you about some of our challenges and possibly work together. Interested? |
| |
|
| ▲ | mysfi a day ago | parent | prev | next [-] |
| I really liked wisprflow on my mac but my daily driver is Manjaro KDE. I have stitched together a bash script that copies the transcription (right now I am using the Parakeet TDT 0.6B) to my clipboard. I would give this a try on linux when it becomes available. |
| |
| ▲ | pstroqaty a day ago | parent | next [-] | | Would you be open to sharing your script? I run whisper.cpp in Linux through some stitched together scripts (https://news.ycombinator.com/item?id=44949314), but would be very curious to try Parakeet. I don't believe I can run it through whisper.cpp? | | | |
| ▲ | blazingbanana a day ago | parent | prev [-] | | Just checked out whisprflow, I must say that looks really nice, kudos to those devs. Shame there isn't a Linux / Android version. I have added the auto-copy to clipboard functionality that will come with the next Android release and be included in all others. Adding a hotkey / quickbar button is on the roadmap for the desktop versions. If you want to give the Linux version a shot, you can download it from here - https://downloads.formait.app/whistle/linux/WhistleDesktop-l... - I've just stuck it in the same R2 bucket as another app, as I've not sorted the proper pipeline out yet. |
|
|
| ▲ | abdullahkhalids a day ago | parent | prev | next [-] |
| Would you consider adding it F-Droid? |
| |
| ▲ | blazingbanana a day ago | parent [-] | | Yes absolutely! I'm a GrapheneOS user myself so understand not wanting to have to go through the play store if you can help it. I believe you have to make the source code public (please correct me if I'm wrong). I'm more than happy to do so, I've used a whole bunch of open source stuff to build the app so it only seems fair, I just need to make it a bit less messy and something I don't mind being public. | | |
| ▲ | kragen a day ago | parent [-] | | Yes, not just public, but also licensed under a license that permits free redistribution, modification, etc. This is awesome! |
|
|
|
| ▲ | twaldecker a day ago | parent | prev [-] |
| nice app! if I am talking in german the text is translating it to english. Didn't expect that |
| |
| ▲ | blazingbanana 16 hours ago | parent [-] | | Thank you! There was a bug causing the "translate to english" to be always enabled. This should work correctly and translate to your native language. Will be in the next update (in a day or two). |
|