Remix.run Logo
somesortofthing 13 hours ago

There's still the question of access to the codebase. By all accounts, the best LLM cyber scanning approaches are really primitive - it's just a bash script that goes through every single file in the codebase and, for each one and runs a "find the vulns here" prompt. The attacker usually has even less access than this - in the beginning, they have network tools, an undocumented API, and maybe some binaries.

You can do a lot better efficiency-wise if you control the source end-to-end though - you already group logically related changes into PRs, so you can save on scanning by asking the LLM to only look over the files you've changed. If you're touching security-relevant code, you can ask it for more per-file effort than the attacker might put into their own scanning. You can even do the big bulk scans an attacker might on a fixed schedule - each attacker has to run their own scan while you only need to run your one scan to find everything they would have. There's a massive cost asymmetry between the "hardening" phase for the defender and the "discovering exploits" phase for the attacker.

Exploitability also isn't binary: even if the attacker is better-resourced than you, they need to find a whole chain of exploits in your system, while you only need to break the weakest link in that chain.

If you boil security down to just a contest of who can burn more tokens, defenders get efficiency advantages only the best-resourced attackers can overcome. On net, public access to mythos-tier models will make software more secure.

anitil 12 hours ago | parent | next [-]

On that latest episode of 'Security Cryptography Whatever' [0] they mention that the time spent on improving the harness (at the moment) end up being outperformed by the strategy of "wait for the next model". I doubt that will continue, but it broke my intuition about how to improve them

[0] https://securitycryptographywhatever.com/2026/03/25/ai-bug-f...

jorvi a few seconds ago | parent | next [-]

That seems very unlikely.

Chinese AI vendors specifically pointed out that even a few gens ago there was maybe 5-15% more capability to squeeze out via training, but that the cost for this is extremely prohibitive and only US vendors have the capex to have enough compute for both inference and that level of training.

I'd take their word over someone that has a vested interested in pushing Anthropic's latest and greatest.

The real improvements are going to be in tooling and harnessing.

conception 11 hours ago | parent | prev | next [-]

This is basically how you should treat all AI dev. Working around AI model limits for something that will take 3-6 months of work has very little ROI compared to building what works today and just waiting and building what works tomorrow tomorrow.

thephyber an hour ago | parent | next [-]

This assumes AI model improvements will be predictable, which they won’t.

There are several simultaneous moving targets: the different models available at any point in time, the model complexity/ capability, the model price per token, the number of tokens used by the model for that query, the context size capabilities and prices, and even the evolution of the codebase. You can’t calculate comparative ROIs of model A today or model B next year unless these are far more predictable than they currently are.

sally_glance 9 hours ago | parent | prev [-]

This is the hard part - especially with larger initiatives, it takes quite a bit of work to evaluate what the current combination of harness + LLM is good at. Running experiments yourself is cumbersome and expensive, public benchmarks are flawed. I wish providers would release at least a set of blessed example trajectories alongside new models.

As it is, we're stuck with "yeah it seems this works well for bootstrapping a Next.js UI"...

yorwba 2 hours ago | parent | prev | next [-]

I think you took away the wrong lesson from that podcast:

I think there is work to be done on scaffolding the models better. This exponential right now reminds me of the exponential from CPU speeds going up until let’s say 2000 or something where you had these game developers who would develop really impressive games on the current thing of hardware and they do it by writing like really detailed intricate x86 instruction sequences for like just exactly whatever this, like, you know, whatever 486 can do, knowing full well that in 2 years, you know, the pen team is gonna be able to do this much faster and they didn’t need to do it. But like you need to do it now because you wanna sell your game today and like, yeah, you can’t just like wait and like have everyone be able to do this. And so I do think that there definitely is value in squeezing out all of the last little juice that you can from the current model.

Everything you can do today will eventually be obsoleted by some future technology, but if you need better results today, you actually have to do the work. If you just drop everything and wait for the singularity, you're just going to unnecessarily cap your potential in the meantime.

theptip 8 hours ago | parent | prev | next [-]

It’s a good thing to keep in mind, but LLM + scaffolding is clearly superior. So if you just use vanilla LLMs you will always be behind.

I think the important thing is to avoid over-optimizing. Your scaffold, not avoid building one altogether.

fragmede 8 hours ago | parent [-]

It's wild to me that a paragraph or 7 of plain English that amounts to "be good at things" is enough to make a material difference in the LLM's performance.

l33tman 3 hours ago | parent | next [-]

As the base is an auto-regressive model that is capable of generating more or less any kind of text, it kind of makes sense though. It always has the capabilities, but you might want it to emulate a stupid analysis as well. So you're leading in with a text that describes what the rest of the text will be in a pretty real sense.

AlexCoventry 6 hours ago | parent | prev [-]

They have no values of their own, so you have to direct their attention that way.

argee 7 hours ago | parent | prev | next [-]

> it broke my intuition about how to improve them

Here we go again.

http://www.incompleteideas.net/IncIdeas/BitterLesson.html

bitexploder 8 hours ago | parent | prev [-]

And if you have the better harness and the next model?

anitil 4 hours ago | parent [-]

I would _hope_ that the double combo would be better, but honestly I have no idea

btown 13 hours ago | parent | prev | next [-]

The problem, though, is that this turns "one of our developers was hit by a supply chain attack that never hit prod, we wiped their computer and rotated keys, and it's not like we're a big target for the attacker to make much use of anything they exfiltrated..." into "now our entire source code has been exfiltrated and, even with rudimentary line-by-line scanning, will be automatically audited for privilege escalation opportunities within hours."

Taken to an extreme, the end result is a dark forest. I don't like what that means for entrepreneurship generally.

linkregister 12 hours ago | parent | next [-]

This is a great example of vulnerability chains that can be broken by vulnerability scanning by even cheaper open source models. The outcome of a developer getting pwned doesn't have to lead to total catastrophe. Having trivial privilege escalations closed off means an attacker will need to be noisy and set off commodity alerting. The will of the company to implement fixes for the 100 Github dependabot alerts on their code base is all that blocks these entrepreneurs.

It does mean that the hoped-for 10x productivity increase from engineers using LLMs is eroded by the increased need for extra time for security.

This take is not theoretical. I am working on this effort currently.

pixl97 10 hours ago | parent | next [-]

I disagree that it's extra time for security, it's the time we should have been spending in the first place.

fragmede 7 hours ago | parent | prev [-]

It's great news for developers. Extra spend on a development/test env so dev have no prod access, prod has no ssh access; and SREs get two laptops, with the second one being a Chromebook that only pulls credentials when it's absolutely necessary.

linkregister 4 hours ago | parent [-]

Yes, having a good development env with synthetic data, and an inaccessible, secure prod env just got justification. I never considered the secondary SRE laptop but I think it might be a good idea.

eru 8 hours ago | parent | prev [-]

> Taken to an extreme, the end result is a dark forest.

Sorry, how does that work?

bryanrasmussen 5 hours ago | parent [-]

since the suggestion is that the new security bug finding LLMs will increase protection because it will have access to the full source code then, the dark forest fear would be, if it is possible for an attacker to get all the source the attacker will be in a better position.

This seems wrong however, as it ignores the arrow of time. The full source code has been scanned and fixed for things that LLMs can find before hitting production, anyone exfiltrating your codebase can only find holes in stuff with their models that is available via production for them to attack and that your models for some reason did not find.

I don't think there is any reason to suppose non-nation state actors will have better models available to them and thus it is not a dark forest, as nation states will probably limit their attacks to specific things, thus most companies if they secure their codebase using LLMs built for it will probably be at a significantly more secure position than nowadays and, I would think, the golden age of criminal hacking is drawing to a close. This assume companies smart enough to do this however.

Furthermore, the worry about nation state attackers still assumes that they will have better models and not sure if that is likely either.

staplers 3 hours ago | parent [-]

  I would think, the golden age of criminal hacking is drawing to a close. This assume companies smart enough to do this however.
It's rarely the systems that are the weak link, rather the humans with backdoor access.
eru 8 hours ago | parent | prev | next [-]

> There's a massive cost asymmetry between the "hardening" phase for the defender and the "discovering exploits" phase for the attacker.

Well, you need to harden everything, the attacker only needs to find one or at most a handful of exploits.

lelanthran 13 minutes ago | parent [-]

> Well, you need to harden everything, the attacker only needs to find one or at most a handful of exploits.

Yeah, but it's not like the attacker knows where to look without checking everything, it it?

If you harden and fix 90% of vulns, the attacker may give up when their attempts reach 80% of vulns.

It's the same as it has ever been; you don't need to outrun the bear, you only need to outrun the other runners.

nl 3 hours ago | parent | prev | next [-]

> By all accounts, the best LLM cyber scanning approaches are really primitive - it's just a bash script that goes through every single file in the codebase

What accounts are these?

I've seen some people use this but I cannot imaging that anyone thinks this is the best.

For example I've had success telling LLMs to scan from application entry points and trace execution, and that seems an extremely obvious thing to do. I can't imagine others in the field don't have much better approaches.

Retr0id 13 hours ago | parent | prev | next [-]

Tokens can also be burnt on decompilation.

tptacek 13 hours ago | parent | next [-]

Yes, and it apparently burns lots of tokens. But what I've heard is that the outcomes are drastically less expensive than hand-reversing was, when you account for labor costs.

jeffmcjunkin 12 hours ago | parent | next [-]

Can confirm. Matching decompilation in particular (where you match the compiler along with your guess at source, compile, then compare assembly, repeating if it doesn't match) is very token-intensive, but it's now very viable: https://news.ycombinator.com/item?id=46080498

Of course LLMs see a lot more source-assembly pairs than even skilled reverse engineers, so this makes sense. Any area where you can get unlimited training data is one we expect to see top-tier performance from LLMs.

(also, hi Thomas!)

stackghost 11 hours ago | parent | next [-]

My own experience has been that "ghidra -> ask LLM to reason about ghidra decompilation" is very effective on all but the most highly obfuscated binaries.

Burning tokens by asking the LLM to compile, disassemble, compare assembly, recompile, repeat seems very wasteful and inefficient to me.

mikestaas 11 hours ago | parent | next [-]

LaurieWired did a good episode about that kind of thing https://www.youtube.com/watch?v=u2vQapLAW88

kimixa 8 hours ago | parent | prev [-]

That matches my experience too - LLMs are very capable in "translating" between domains - one of the best experience I've had with LLMs is turning "decompiled" source into "human readable" source. I don't think that "Binary Only" closed-source isn't the defense against this that some people here seem to think it is.

echelon 8 hours ago | parent | prev [-]

Has anyone used an LLM to deobfuscate compiled Javascript?

lelanthran a minute ago | parent | next [-]

> Has anyone used an LLM to deobfuscate compiled Javascript?

Seems like a waste of money; wouldn't it be better to extract the AST deterministically, write it out and only then ask an LLM to change those auto-generated symbol names with meaningful names?

heeen2 2 hours ago | parent | prev | next [-]

yes, but it requires some nudging if you don't want to waste tokens. it will happily grep and sed through massive javascript bundles but if you tell it to first create tooling like babel scripts to format, it will be much quicker.

saagarjha 2 hours ago | parent | prev | next [-]

I've used it for hobby efforts on Electron/React Native (Hermes bytecode) apps and it seems to work reasonably well

bitexploder 8 hours ago | parent | prev [-]

Yep. They are good at it.

gfosco 9 hours ago | parent | prev [-]

Yeah, it's token intensive but worth it. I built a very dumb example harness which used IDA via MCP and analyzed/renamed/commented all ~67k functions in a binary, using Claude Haiku for about $150. A local model could've accomplished it for much less/free. The knowledge base it outputs and the marked up IDA db are super valuable.

whattheheckheck 9 hours ago | parent [-]

Do you have the repo example?

heeen2 2 hours ago | parent [-]

I did something similar using ghidramcp for digging around this keyboard firmware, repo contains the ghidra project, linux driver and even patches to the original stock fw. https://github.com/echtzeit-solutions/monsgeek-akko-linux

somesortofthing 13 hours ago | parent | prev | next [-]

Another asymmetric advantage for defenders - attackers need to burn tokens to form incomplete, outdated, and partially wrong pictures of the codebase while the defender gets the whole latest version plus git history plus documentation plus organizational memory plus original authors' cooperation for free.

echelon 8 hours ago | parent | prev [-]

> Tokens can also be burnt on decompilation.

Prediction 1. We're going to have cheap "write Photoshop and AutoCad in Rust as a new program / FOSS" soon. No desktop software will be safe. Everything will be cloned.

Prediction 2. We'll have a million Linux and Chrome and other FOSS variants with completely new codebases.

Prediction 3. People will trivially clone games, change their assets. Modding will have a renaissance like never before.

Prediction 4. To push back, everything will move to thin clients.

jgraham 2 hours ago | parent [-]

I think if prediction 1 is true (that it becomes cheap to clone existing software in a way that doesn't violate copyright law), the response will not be purely technical (moving to thin clients, or otherwise trying to technically restrict the access surface to make reverse engineering harder). Instead I'd predict that companies look to the law to replace the protections that they previously got from copyright.

Obvious possibilities include:

* More use of software patents, since these apply to underlying ideas, rather than specific implementations.

* Stronger DMCA-like laws which prohibit breaking technical provisions designed to prevent reverse engineering.

Similarly, if the people predicting that humans are going to be required to take ultimate responsibility for the behaviour of software are correct, then it clearly won't be possible for that to be any random human. Instead you'll need legally recognised credentials to be allowed to ship software, similar to the way that doctors or engineers work today.

Of course these specific predictions might be wrong. I think it's fair to say that nobody really knows what might have changed in a year, or where the technical capabilities will end up. But I see a lot of discussions and opinions that assume zero feedback from the broader social context in which the tech exists, which seems like they're likely missing a big part of the picture.

bryanrasmussen 5 hours ago | parent | prev | next [-]

>By all accounts, the best LLM cyber scanning approaches are really primitive

It seems like that is perhaps not the case anymore with the Mythos model?

kelvinjps10 11 hours ago | parent | prev | next [-]

what about open source software?

pbgcp2026 8 hours ago | parent | prev [-]

[dead]