Remix.run Logo
zachlatta 8 hours ago

I just learned about Handy in this thread and it looks great!

I think the biggest difference between FreeFlow and Handy is that FreeFlow implements what Monologue calls "deep context", where it post-processes the raw transcription with context from your currently open window.

This fixes misspelled names if you're replying to an email / makes sure technical terms are spelled right / etc.

The original hope for FreeFlow was for it to use all local models like Handy does, but with the post-processing step the pipeline took 5-10 seconds instead of <1 second with Groq.

sipjca 4 hours ago | parent | next [-]

There's an open PR in the repo which will be merged which adds this support. Post processing is an optional feature if you want to use it, and when using it, end to end latency can still be under 3 seconds easily

zachlatta 4 hours ago | parent [-]

That’s awesome! The specific thing that was causing the long latency was the image LLM call to describe the current context. I’m not sure if you’ve tested Handy’s post-processing with images or if there’s a technique to get image calls to be faster locally.

Thank you for making Handy! It looks amazing and I wish I found it before making FreeFlow.

lemming 6 hours ago | parent | prev | next [-]

Could you go into a little more detail about the deep context - what does it grab, and which model is used to process it? Are you also using a groq model for the transcription?

zachlatta 4 hours ago | parent [-]

It takes a screenshot of the current window and sends it to Llama in Groq asking it to describe what you’re doing and pull out any key info like names with spelling.

You can go to Settings > Run Logs in FreeFlow to see the full pipeline ran on each request with the exact prompt and LLM response to see exactly what is sent / returned.

stavros 8 hours ago | parent | prev [-]

As a very happy Handy user, it doesn't do that indeed. It will be interesting to see if it works better, I'll give FreeFlow a shot, thanks!