Remix.run Logo
blindhippo 3 days ago

Might work for you, but if I multi task too much, the quality of my output drops significantly. Where I work, that does not fly. I cannot trust any agent to handle anything without babysitting them to avoid going off the rails - but perhaps the tools I have access to just aren't good (underlying model is claude 4.5, so it the model isn't the cause).

I've said this in the past and I'll continue to say it - until the tools get far better at managing context, they will be hard locked for value in most use cases. The moment I see "summarizing conversation" I know I'm about to waste 20 minutes fixing code.

dionian 3 days ago | parent | next [-]

I think it depends on the project and the context, but I developed my own task management system particularly because of this challenge. I'm starting to extend this with verification gates as well.

If I worked on different types of systems with different types of tasks I might feel the same way as you, i think AI works well in specific targeted use cases, where some amount of hallucination can be tolerated and addressed.

What models are you using, I use opus 4.5, which can one shot a surprising ratio of tasks.

fragmede 3 days ago | parent | prev [-]

If you can predict that hitting “summarize conversation” equals rework, what can you change upstream so you avoid triggering it? Are you relying on the agent to carry state instead of dumping it into .MD files? What happens if your computer crashes?

> so it the model isn't the cause

Thing is, the prompts, those stupid little bits of English that can't possiu matter all that much? It turns out they affect the models performance a ton.