Remix.run Logo
testdelacc1 10 days ago

Per token costs will fall, but the harnesses will get more token hungry. Instead of just centering the div it’ll spin up a battery of agents to architect, critique, advise, code, review, refactor and so on.

sevenzero 10 days ago | parent | next [-]

I wish I could disable most of these. I already hate all the "oh you're actually right, let me fix that" nonsense. Then it proceeds to burn 50k tokens on the git history instead of copying logic A from a different part of the codebase to logic B, where I want that exact logic without having to write the boilerplate myself...

apsurd 10 days ago | parent | next [-]

Makes me think of how my Claude.md files specifies to use the built in framework code-generators (rails). Those generators are deterministically right every time.

I wonder how often the Agent actually follows the guidance. I do see them follow it when I look. But it doesn't seem so every time.

thefunnyman 10 days ago | parent [-]

This is tricky since it can and will ignore your md directions. When possible I try to lean on tool call hooks or skills that invoke deterministic scripts. As much as you can remove the "choice" the better though still there's a lot of randomness in how reliably it invokes skills ime.

internet101010 9 days ago | parent [-]

Hooks are incredibly underused by most people and are the easiest way to establish a first line of defense against bad behavior. Things like blocking tool calls that will read .env file or execute "create or replace table".

apsurd 8 days ago | parent [-]

im implementing this now. thanks. the guides specify the exact intention of more determinism.

sfn42 10 days ago | parent | prev [-]

A lot of the time if you're copying code from one place to another what you actually want to do is abstract it so you can reuse it in both places.

The LLM can easily do this type of stuff, just tell it and it'll happily do it. This is exactly what I mean when I tell people they need to work closer with the AI, tell it how to do things. Don't just tell it what to do and get frustrated when it does it differently than you would.

A good way to achieve this without writing huge prompts is tell it to plan the change first. Just give it some vague low-effort directions. It'll usually get most things right, you tell it what you want different and once you're happy you tell it to go ahead.

sevenzero 10 days ago | parent | next [-]

Nah the codebase is legacy fucked and I cant be bothered to try and optimize business flows without the fear of other stuff breaking.

Claude 100% of the time even thinks we use laravel despite the project being some old lumen codebase, so most of laravels features are not available. It also gets the PHP version we are using wrong 100% of the time.

sfn42 9 days ago | parent [-]

Have you tried adding this information to claude.md so it knows?

I also think your excuse is bad. "The code is legacy fucked so I'll just legacy fuck it some more because I can't be bothered to make an effort"

sevenzero 9 days ago | parent | next [-]

Are you some kind of entitled corporate dev that barely has any influence on the codebase? If I fuck up a whole business goes down as I am the only dev there currently. We cant afford that happening. Also why would I mess with anything claude.md related? I just use the CLI tool. LLM enthusiasts always claim how smart these things are so they should figure it out on their own, you know?

sfn42 9 days ago | parent [-]

I have full control of my codebase. I'm not afraid to make changes to it because I know what I'm doing.

You would edit Claude.md to say things like what tech the project is using, because that's the entire point of claude.md. It's literally the solution to the exact problem you're complaining about. Any information you want it to know, you put in there and then it knows it. And you can tell Claude to make or update the file for you.

I'm not one of the people telling you how smart LLMs are. I'm telling you how to use it efficiently, by not expecting it to know everything but rather provide the information that it needs in order to be a more useful tool.

adithyassekhar 9 days ago | parent | prev [-]

This is a spicy take, unless the business is willing to face some down time, and I am hired to do exactly what you said, I’d never touch any line of code unless I absolutely have to. Different environments don’t help as much.

We tend to obsess over software quality when it’s the least important thing for a business. It’s just a means to an end.

sevenzero 9 days ago | parent | next [-]

This is what its about, we have multiple ecom shops running 24/7 and cant simply afford downtime or a change of business flow that maybe doesnt affect shop A and B but definitely affects shop C and D...

sfn42 9 days ago | parent | prev [-]

> Least important thing for a business

- Takes weeks or months to get simple features out the door, and when they're out they're buggy as hell and the bugs never get fixed. Sound familiar?

> I’d never touch any line of code unless I absolutely have to

And this is how legacy code is made. Years of everyone "never touching anything they don't have to" leads to a giant steaming pile of shit.

> unless the business is willing to face some down time

How does a simple refactor cause downtime? I do this kind of stuff all the time and pretty much never cause any downtime. In the very rare cases that prod downtime does occur it's generally not because of some simple code refactor, and we have it back up in no time by just rolling it back. Unless it's not related to the code at all, in which case it also wasn't a refactor that caused it.

hakfoo 7 days ago | parent | prev | next [-]

That feels like a market failure though. For a tool to be a useful extension of the user, it should work in the way a user expects it, without a huge amount of having to realign and repackage your normal process.

Maybe that's something we can hope for in a next-generation of LLM product. Right now, the race seems to be all about performance and capability, but maybe when we get to a plateau of performance, vendors can start differentiating by building tools with clearer voices and expectations-- focused system prompts and training, maybe. If you know DeepSeek will follow your requests fairly literally, while Qwen will start adding best-effort tweaks, you can decide which one is the right choice for a given task.

I asked Claude to read two logs and assemble them in a single table for easy reading the other day. It takes me like 30 seconds to pull and toggle between the logs normally, but I figured it would be nice to have a skill to let the machine crunch it all onto a single page. After 5 minutes, it spat up a ball of Markdown with half the content truncated and summarized it in a way I didn't ask for and had no interest in.

If I had asked a human to do it, there's no way it would come to that conclusion because doing the wrong thing is literally more effort. Maybe the model did those things because "typical" requests want summarization so it's the implicit default, but IT SHOULDN'T BE MY RESPONSIBILITY TO GUESS THIS.

sfn42 6 days ago | parent [-]

You're just expecting too much. If a task takes you 30 seconds to do you're almost certainly better off doing it yourself than getting an LLM to do it. If it's a recurring task it might make sense to create a skill for it, and this is exactly the use case for skills. Give precise instructions so it does the task correctly, and save them for later so you can do it again easily.

I don't really get how you guys can be so demanding - this technology is magic. It's doing things that 5 years ago we could only dream of. It still blows my mind every time I paste a screenshot of some vague issue along with a quick and dirty prompt and it just gets it and gives me the right answer immediately.

In the hands of a competent user these things are absolutely incredible, I can develop solutions faster, with higher quality and less effort. So honestly man all you guys complaining that they aren't good enough? I can't help but think you guys must really not be very competent. Complaining about problems while the solution is staring you in the face.

hakfoo 5 days ago | parent [-]

> I don't really get how you guys can be so demanding - this technology is magic

That could be the problem. I suspect a lot of developers have spent years developing workflows and understandings based on the idea the machine is precise, repeatable, and does exactly as it's told. "Magic" is a very poor match for that strategy.

> Complaining about problems while the solution is staring you in the face.

Not quite sure what the "solution" is here. Am I supposed to try to restyle the prompt to be "quick and dirty" to give Claude more room to stretch and hopefully hit my desired goal? Or am I supposed to iterate repeatedly on the skill to add a harness of "don't truncate that, don't add a summary, etc" until it behaves how I want?

I'm not saying you're wrong. I think it's almost more like the difference between programming languages. If you come into writing FORTRAN with a TCL/Tk mindset, you're going to have a hard time getting what you want, but the industry understood that and made environments for both. I suspect right now, since the big market is outside the hardcore programmer market, they're going to focus on the "it does magic with vague prompts" version before the "it's reliable and precise with specific prompts" one.

camdenreslink 9 days ago | parent | prev [-]

There are a lot of instances where you don't want to create an abstraction that will tie two disparate areas of the code together even if they happen to be using a similar pattern you want to copy. For example, when you expect their implementations to diverge in the future.

I have experienced enterprise codebases that have been DRY'd to the point they become ossified.

sfn42 9 days ago | parent [-]

That's why I said "a lot of the time". Not always. And it's not really a problem to de-DRY things, literally just copy/paste and make the change you want. The bigger problem in my eyes is when the requirements start to diverge people just add an if branch and soon you have a function/component that does 7 different things depending on how it's used and it's a big buggy mess.

It's also possible in many of these cases to identify sub-patterns you could abstract, to create a set of tools you can compose in different ways in order to satisfy the different use cases. Instead of one function/component you make multiple, and use them together.

All this stuff is just basic programming but I've mostly given up trying to preach about it. Most people don't care, and even if they did care they just don't have the talent to write really good code. It's rare to find a dev who does really solid work. In my experience you either do it because that's who you are, or nothing I say will make any difference.

KaiShips 10 days ago | parent | prev [-]

[flagged]