Remix.run Logo
scuff3d 6 hours ago

How anybody can read stuff like this and still take all this seriously is beyond me. This is becoming the engineering equivalent of astrology.

energy123 2 hours ago | parent | next [-]

Anthropic recommends doing magic invocations: https://simonwillison.net/2025/Apr/19/claude-code-best-pract...

It's easy to know why they work. The magic invocation increases test-time compute (easy to verify yourself - try!). And an increase in test-time compute is demonstrated to increase answer correctness (see any benchmark).

It might surprise you to know that the only different between GPT 5.2-low and GPT 5.2-xhigh is one of these magic invocations. But that's not supposed to be public knowledge.

gehsty 34 minutes ago | parent [-]

I think this was more of a thing on older models. Since I started using Opus 4.5 I have not felt the need to do this.

fragmede 6 hours ago | parent | prev [-]

Feel free to run your own tests and see if the magic phrases do or do not influence the output. Have it make a Todo webapp with and without those phrases and see what happens!

scuff3d 5 hours ago | parent [-]

That's not how it works. It's not on everyone else to prove claims false, it's on you (or the people who argue any of this had a measurable impact) to prove it actually works. I've seen a bunch of articles like this, and more comments. Nobody I've ever seen has produced any kind of measurable metrics of quality based on one approach vs another. It's all just vibes.

Without something quantifiable it's not much better then someone who always wears the same jersey when their favorite team plays, and swears they play better because of it.

yaku_brang_ja 3 hours ago | parent | next [-]

These coding agents are literally Language Models. The way you structure your prompting language affect the actual output.

guiambros 4 hours ago | parent | prev | next [-]

If you read the transformer paper, or get any book on NLP, you will see that this is not magic incantation; it's purely the attention mechanism at work. Or you can just ask Gemini or Claude why these prompts work.

But I get the impression from your comment that you have a fixed idea, and you're not really interested in understanding how or why it works.

If you think like a hammer, everything will look like a nail.

scuff3d 3 hours ago | parent [-]

I know why it works, to varying and unmeasurable degrees of success. Just like if I poke a bull with a sharp stick, I know it's gonna get it's attention. It might choose to run away from me in one of any number of directions, or it might decide to turn around and gore me to death. I can't answer that question with any certainty then you can.

The system is inherently non-deterministic. Just because you can guide it a bit, doesn't mean you can predict outcomes.

guiambros 2 hours ago | parent | next [-]

> The system is inherently non-deterministic.

The system isn't randomly non-deterministic; it is statistically probabilistic.

The next-token prediction and the attention mechanism is actually a rigorous deterministic mathematical process. The variation in output comes from how we sample from that curve, and the temperature used to calibrate the model. Because the underlying probabilities are mathematically calculated, the system's behavior remains highly predictable within statistical bounds.

Yes, it's a departure from the fully deterministic systems we're used to. But that's not different than the many real world systems: weather, biology, robotics, quantum mechanics. Even the computer you're reading this right now is full of probabilistic processes, abstracted away through sigmoid-like functions that push the extremes to 0s and 1s.

winrid 3 hours ago | parent | prev [-]

But we can predict the outcomes, though. That's what we're saying, and it's true. Maybe not 100% of the time, but maybe it helps a significant amount of the time and that's what matters.

Is it engineering? Maybe not. But neither is knowing how to talk to junior developers so they're productive and don't feel bad. The engineering is at other levels.

tokioyoyo 4 hours ago | parent | prev [-]

Do you actively use LLMs to do semi-complex coding work? Because if not, it will sound mumbo-jumbo to you. Everyone else can nod along and read on, as they’ve experienced all of it first hand.

scuff3d 4 hours ago | parent [-]

You've missed the point. This isn't engineering, it's gambling.

You could take the exact same documents, prompts, and whatever other bullshit, run it on the exact same agent backed by the exact same model, and get different results every single time. Just like you can roll dice the exact same way on the exact same table and you'll get two totally different results. People are doing their best to constrain that behavior by layering stuff on top, but the foundational tech is flawed (or at least ill suited for this use case).

That's not to say that AI isn't helpful. It certainly is. But when you are basically begging your tools to please do what you want with magic incantations, we've lost the fucking plot somewhere.

geoelectric an hour ago | parent | next [-]

I think that's a pretty bold claim, that it'd be different every time. I'd think the output would converge on a small set of functionally equivalent designs, given sufficiently rigorous requirements.

And even a human engineer might not solve a problem the same way twice in a row, based on changes in recent inspirations or tech obsessions. What's the difference, as long as it passes review and does the job?

gf000 3 hours ago | parent | prev [-]

> You could take the exact same documents, prompts, and whatever other bullshit, run it on the exact same agent backed by the exact same model, and get different results every single time

This is more of an implementation detail/done this way to get better results. A neural network with fixed weights (and deterministic floating point operations) returning a probability distribution, where you use a pseudorandom generator with a fixed seed called recursively will always return the same output for the same input.