| ▲ | BeetleB a day ago | |
Similar story, albeit not so extreme. I have similar ergonomic issues that crop up from time to time. My programming is not so impacted (spend more time thinking than typing, etc), but things like email, documentation, etc can be brutal (a lot more computer usage vs programming). My simple solution: I use Whisper to transcribe my text, and feed the output to an LLM for cleanup (custom prompt). It's fantastic. Way better than stuff like Dragon. Now I get frustrated with transcribing using Google's default mechanism on Android - so inaccurate! But the ability to take notes, dictate emails, etc using Whisper + LLM is invaluable. I likely would refuse to work for a company that won't let me put IP into an LLM. Similarly, I take a lot of notes on paper, and would have to type them up. Tedious and painful. I switched to reading my notes aloud and use the above system to transcribe. Still painful. I recently realized Gemini will do a great job just reading my notes. So now I simply convert my notes to a photo and send to Gemini. I categorize all my expenses. I have receipts from grocery stores where I highlight items into categories. You can imagine it's painful to enter that into a financial SW. I'm going to play with getting Gemini to look at the photo of the receipt and categorize and add up the categories for me. All of these are cool applications on their own, but when you realize they're also improving your health ... clear win. | ||
| ▲ | mr-wendel 20 hours ago | parent | next [-] | |
> I'm going to play with getting Gemini to look at the photo of the receipt and categorize and add up the categories for me. FWIW, I have a pet project for a family recipe book. I normalize all recipes to a steps/instructions/ingredients JSON object. A webapp lets me snap photos of my old recipes and AI reliably yields perfectly structured objects back. The only thing I've had to fix is odd punctuation. For production, use is low, so `gemini-2.5-flash` works great and the low rate limits are fine. For development the `gemma-3-27b-it` model has MUCH higher limits and still does suprisingly well. I'd bet you can pull this off and be very happy with the result. | ||
| ▲ | nunez 14 hours ago | parent | prev [-] | |
I maintain expense tracking software that I wrote a while ago (before ChatGPT) that sends receipts and some metadata about them into Google Sheets (previously Expensify). A few months ago, I used Claude to add a feature that does exactly what you describe, but using the data types and framework I built for receipt parsing. It works really well. Honestly, you can probably build what I built entirely with Gemini or Claude, probably with a nice frontend to boot. | ||