Remix.run Logo
hakfoo 7 days ago

That feels like a market failure though. For a tool to be a useful extension of the user, it should work in the way a user expects it, without a huge amount of having to realign and repackage your normal process.

Maybe that's something we can hope for in a next-generation of LLM product. Right now, the race seems to be all about performance and capability, but maybe when we get to a plateau of performance, vendors can start differentiating by building tools with clearer voices and expectations-- focused system prompts and training, maybe. If you know DeepSeek will follow your requests fairly literally, while Qwen will start adding best-effort tweaks, you can decide which one is the right choice for a given task.

I asked Claude to read two logs and assemble them in a single table for easy reading the other day. It takes me like 30 seconds to pull and toggle between the logs normally, but I figured it would be nice to have a skill to let the machine crunch it all onto a single page. After 5 minutes, it spat up a ball of Markdown with half the content truncated and summarized it in a way I didn't ask for and had no interest in.

If I had asked a human to do it, there's no way it would come to that conclusion because doing the wrong thing is literally more effort. Maybe the model did those things because "typical" requests want summarization so it's the implicit default, but IT SHOULDN'T BE MY RESPONSIBILITY TO GUESS THIS.

sfn42 6 days ago | parent [-]

You're just expecting too much. If a task takes you 30 seconds to do you're almost certainly better off doing it yourself than getting an LLM to do it. If it's a recurring task it might make sense to create a skill for it, and this is exactly the use case for skills. Give precise instructions so it does the task correctly, and save them for later so you can do it again easily.

I don't really get how you guys can be so demanding - this technology is magic. It's doing things that 5 years ago we could only dream of. It still blows my mind every time I paste a screenshot of some vague issue along with a quick and dirty prompt and it just gets it and gives me the right answer immediately.

In the hands of a competent user these things are absolutely incredible, I can develop solutions faster, with higher quality and less effort. So honestly man all you guys complaining that they aren't good enough? I can't help but think you guys must really not be very competent. Complaining about problems while the solution is staring you in the face.

hakfoo 5 days ago | parent [-]

> I don't really get how you guys can be so demanding - this technology is magic

That could be the problem. I suspect a lot of developers have spent years developing workflows and understandings based on the idea the machine is precise, repeatable, and does exactly as it's told. "Magic" is a very poor match for that strategy.

> Complaining about problems while the solution is staring you in the face.

Not quite sure what the "solution" is here. Am I supposed to try to restyle the prompt to be "quick and dirty" to give Claude more room to stretch and hopefully hit my desired goal? Or am I supposed to iterate repeatedly on the skill to add a harness of "don't truncate that, don't add a summary, etc" until it behaves how I want?

I'm not saying you're wrong. I think it's almost more like the difference between programming languages. If you come into writing FORTRAN with a TCL/Tk mindset, you're going to have a hard time getting what you want, but the industry understood that and made environments for both. I suspect right now, since the big market is outside the hardcore programmer market, they're going to focus on the "it does magic with vague prompts" version before the "it's reliable and precise with specific prompts" one.