Remix.run Logo
dkdcio 15 hours ago

I have tons of examples of AI not committing secrets. this is one screenshot from twitter? I don’t think it makes your point

CPUs are billions of transistors. sometimes one fails and things still work. “probabilistic quicksand” isn’t the dig you think it is to people who know how this stuff works

Mawr 14 hours ago | parent | next [-]

I have tons of examples of drivers not running into objects.

dkdcio 14 hours ago | parent [-]

like my other comment, my point is one screenshot from twitter vs one anecdote. neither proves anything. cool snarky response though!

rvz 15 hours ago | parent | prev [-]

> I have tons of examples of AI not committing secrets.

"Trust only me bro".

It takes 10 seconds to see the many examples of API keys + prompts on GitHub to verify that tweet. The issue with AI isn't limited to that tweet which demonstrates its probabilistic nature; Otherwise why do need a sandbox to run the agent in the first place?

Nevermind, we know why: Many [0] such [1] cases [2]

> CPUs are billions of transistors. sometimes one fails and things still work. “probabilistic quicksand” isn’t the dig you think it is to people who know how this stuff works

Except you just made a false equivalence. CPUs can be tested / verified transparently and even if it does go wrong, we know exactly why. Where as you can't explain why the LLM hallucinated or decided to delete your home folder because the way it predicts what it outputs is fundamentally stochastic.

[0] https://old.reddit.com/r/ClaudeAI/comments/1pgxckk/claude_cl...

[1] https://old.reddit.com/r/ClaudeAI/comments/1jfidvb/claude_tr...

[2] https://www.google.com/search?q=ai+deleted+files+site%3Anews...

dkdcio 14 hours ago | parent [-]

you could find tons of API keys on GitHub before these “agentic” tools too. that was my point, one screenshot from twitter vs one anecdote from me. I don’t think either proves the point, but posting a screenshot from twitter like it’s proof of some widespread problem is what I was responding to (N=2, 1 vs 1)

my point is more “skill issue” than “trust me this never happens”

my point on CPUs is people who don’t understand LLMs talk like “hallucinations” are a real thing — LLMs are “deciding” to make stuff up rather than just predicting the next token. yes it’s probabilistic, so is practically everything else at scale. yet it works and here we are. can you really explain in detail how everything you use works? I’m guessing I can explain failure modes of agentic systems (and how to avoid them so you don’t look silly on twitter/github) and how neural networks work better than most people can explain the technology they use every day

rvz 13 hours ago | parent [-]

> you could find tons of API keys on GitHub before these “agentic” tools too. that was my point, one screenshot from twitter vs one anecdote from me. I don’t think either proves the point, but posting a screenshot from twitter like it’s proof of some widespread problem is what I was responding to (N=2, 1 vs 1)

That doesn't refute the probabilistic nature of LLMs despite best prompting practices. In fact it emphasises it. More like your 1 anecdotal example vs my 20+ examples on GitHub.

My point tells you that not only it indeed does happen, but a previous old issue is now made even worse and more widespread, since we now have vibe-coders without security best practices assuming the agent should know better (when it doesn't).

> my point is more “skill issue” than “trust me this never happens”

So those that have this "skill issue" are also those who are prompting the AI differently then? Either way, this just inadvertently proves my whole point.

> yes it’s probabilistic, so is practically everything else at scale. yet it works and here we are.

The additional problem is can you explain why it went wrong as you scale the technology? CPUs circuit design go through formal verification and if a fault happens, we know exactly why; hence it is deterministic in design which makes them reliable.

LLMs are not and don't have this. Which is why OpenAI had to describe ChatGPT's misaligned behaviour as "sycophancy", but could not explain why it happened other than tweaking the hyper-parameters which got them that result.

So LLMs being fundamentally probabilistic and are hence, more unexplainable being the reason why you have the screenshot of vibe-coders who somehow prompted it wrong and the agent committed the keys.

Maybe that would never have happened to you, but it won't be the last time we see more of this happening on GitHub.

dkdcio 13 hours ago | parent [-]

I was pointing out one screenshot from twitter isn’t proof of anything just to be clear; it’s a silly way to make a point.

yes AI makes leaking keys on GH more prevalent, but so what? it’s the same problem as before with roughly the same solution

I’m saying neural networks being probabilistic doesn’t matter — everything is probabilistic. you can still practically use the tools to great effect, just like we use everything else that has underlying probabilities

OpenAI did not have to describe it as sycophancy, they chose to, and I’d contend it was a stupid choice

and yes, you can explain what went wrong just like you can with CPUs. we don’t (usually) talk about quantum-level physics when discussing CPUs; talking about neurons in LLMs is the wrong level of abstraction

rvz 12 hours ago | parent [-]

> I was pointing out one screenshot from twitter isn’t proof of anything just to be clear; it’s a silly way to make a point.

Verses your anecdote being a proof of what? Skill issue for vibe coders? Someone else prompting it wrong?

You do realize you are proving my entire point?

> yes AI makes leaking keys on GH more prevalent, but so what? it’s the same problem as before with roughly the same solution

Again, it exacerbates my point such that it makes the existing issue even worse. Additionally, that wasn't even the only point I made on the subject.

> I’m saying neural networks being probabilistic doesn’t matter — everything is probabilistic.

When you scale neural networks to become say, production-grade LLMs, then it does matter. Just like it does matter for CPUs to be reliable when you scale them in production-grade data centers.

But your earlier (fallacious) comparison ignores the reliability differences between them (CPUs vs LLMs.) and determinism is a hard requirement for that; which the latter, LLMs are not.

> OpenAI did not have to describe it as sycophancy, they chose to, and I’d contend it was a stupid choice

For the press, they had to, but no-one knows the real reason, because it is unexplainable; going back to my other point on reliability.

> and yes, you can explain what went wrong just like you can with CPUs. we don’t (usually) talk about quantum-level physics when discussing CPUs; talking about neurons in LLMs is the wrong level of abstraction

It is indeed wrong for LLMs because not even the researchers can practically give an explanation why a single neuron (for every neuron in the network) gives different values on every fine-tune or training run. Even if it is "good enough", it can still go wrong at the inference-level for other unexplainable reasons other than it "overfitted".

CPUs on the other hand, have formal verification methods which verify that the CPU conforms to its specification and we can trust that it works as intended and can diagnose the problem accurately without going into atomic-level details.

dkdcio 11 hours ago | parent [-]

…what is your point exactly (and concisely)? I’m saying it doesn’t matter it’s probabilistic, everything is, the tech is still useful

rvz 11 hours ago | parent [-]

No one is arguing that it isn't useful. The problem is this:

> I’m saying it doesn’t matter it’s probabilistic, everything is,

Maybe it doesn't matter for you, but it generally does matter.

The risk level of a technology failing is far higher if it is more random and unexplainable than if it is expected, verified and explainable. The former eliminates many serious use-cases.

This is why your CPU, or GPU works.

LLMs are neither deterministic, no formal verification exists and are fundamentally black-boxes.

That is why many vibe-coders reported many "AI deleted their entire home folder" issues even when they told it to move a file / folder to another location.

If it did not matter, why do you need sandboxes for the agents in the first place?

dkdcio 10 hours ago | parent [-]

I think we agree then? the tech is useful; you need systems around them (like sandboxes and commit hooks that prevent leaking secrets) to use them effectively (along with learned skills)

very little software (or hardware) used in production is formally verified. tons of non-deterministic software (including neural networks) are operating in production just fine, including in heavily regulated sectors (banking, health care)

rvz 3 hours ago | parent [-]

> I think we agree then? the tech is useful; you need systems around them (like sandboxes and commit hooks that prevent leaking secrets) to use them effectively (along with learned skills)

No.

> very little software (or hardware) used in production is formally verified. tons of non-deterministic software (including neural networks) are operating in production just fine, including in heavily regulated sectors (banking, health care)

It's what happens when it all goes wrong.

You have to explain exactly why, a system failed in heavily regulated sectors.

Saying 'everything is probabilistic' as the reason for the cause of an issue, is a non answer if you are a chip designer, air traffic controller, investment banker or medical doctor.

So your point does not follow.

dkdcio 3 hours ago | parent [-]

that’s not what I said. you honestly seem like you just want to argue about stuff (e.g. not elaborating on the “no” when I basically repeated and agreed with what you said). and you seem to consistently miss my point (in the second part of your response; I’m saying these non-deterministic neural networks are already widespread in industry with these regulations, and it’s fine. they can be explained despite your repeated assertions they cannot be. also the entire point on CPUs which you may have noticed I dropped from my responses because you seemed distracted arguing about it). this is not productive and we’re both clearly stubborn, glhf

rvz 2 hours ago | parent [-]

> that’s not what I said. you honestly seem like you just want to argue about stuff (e.g. not elaborating on the “no” when I basically repeated and agreed with what you said). and you seem to consistently miss my point

I have repeated myself many times and you decide to continue to ignore the reliability points that inherently impede LLMs in many use-cases which exclude them in areas where predictability in critical systems is required in production.

Vibe coders can use them, but the gulf between useful for prototyping and useful for production is riddled with hard obstacles as such a software like LLMs are fundamentally unpredictable hence the risks are far greater.

> I’m saying these non-deterministic neural networks are already widespread in industry with these regulations, and it’s fine.

So when a neural network scales beyond hundreds of layers and billions of parameters, equivalent to a production-grade LLM, explain exactly how is such a black-box on that scale explainable when it messes up and goes wrong?

> they can be explained despite your repeated assertions they cannot be.

With what methods exactly?

Early on, I said formal verification and testing on CPUs for explaining when they go wrong at scale. It is you that provided absolutely nothing of your own assertions with the equivalent for LLMs other than "they can be explained" without providing any evidence.

> also the entire point on CPUs which you may have noticed I dropped from my responses because you seemed distracted arguing about it). this is not productive and we’re both clearly stubborn, glhf

You did not make any point with that as it was a false equivalence, and I explained why the reliability of a CPU isn't the same as the reliability of a LLM.