| ▲ | wittlesus 2 hours ago | |
This is genuinely exciting. The fact that you're getting 15-30 tok/s for text gen on phone hardware is wild — that's basically usable for real conversations. Curious about a couple things: what GGUF model sizes are practical on a mid-range phone (say 8GB RAM)? And how's the battery impact during sustained inference — does it drain noticeably faster than, say, a video call? The privacy angle is the real killer feature here IMO. There are so many use cases (journaling, health tracking, sensitive work notes) where people self-censor because they know it's going to a server somewhere. Removing that barrier entirely changes what people are willing to use AI for. | ||
| ▲ | durhamg 2 hours ago | parent | next [-] | |
This sounds exactly like Claude wrote it. I've noticed Claude saying "genuinely" a lot lately, and the "real killer feature" segue just feels like Claude being asked to review something. | ||
| ▲ | ali_chherawalla 2 hours ago | parent | prev | next [-] | |
I've added a section for recommended models. So basically you can chose from there. I'd recommend going for any quantized 1B parameter model. So you can look at llama 3.2 1B, gemma3 1B, qwen3 VL 2B (if you'd like vision) Appreciate the kind words! | ||
| ▲ | add-sub-mul-div 2 hours ago | parent | prev [-] | |
> that's basically usable for real conversations. That's using the word "real" very loosely. | ||