| ▲ | TacticalCoder 8 hours ago | ||||||||||||||||||||||
> That's why their point is what the subheadline says, that the moat is the system, not the model. Can you expand a bit more on this? What is the system then in this case? And how was that model created? By AI? By humans? | |||||||||||||||||||||||
| ▲ | SCHiM 8 hours ago | parent [-] | ||||||||||||||||||||||
You can imagine a pipeline that looks at individual source files or functions. And first "extracts" what is going on. You ask the model: - "Is the code doing arithmetic in this file/function?" - "Is the code allocating and freeing memory in this file/function?" - "Is the code the code doing X/Y/Z? etc etc" For each question, you design the follow-up vulnerability searchers. For a function you see doing arithmetic, you ask: - "Does this code look like integer overflow could take place?", For memory: - "Do all the pointers end up being freed?" _or_ - "Do all pointers only get freed once?" I think that's the harness part in terms of generating the "bug reports". From there on, you'll need a bunch of tools for the model to interact with the code. I'd imagine you'll want to build a harness/template for the file/code/function to be loaded into, and executed under ASAN. If you have an agent that thinks it found a bug: "Yes file xyz looks like it could have integer overflow in function abc at line 123, because...", you force another agent to load it in the harness under ASAN and call it. If ASAN reports a bug, great, you can move the bug to the next stage, some sort of taint analysis or reach-ability analysis. So at this point you're running a pipeline to: 1) Extract "what this code does" at the file, function or even line level. 2) Put code you suspect of being vulnerable in a harness to verify agent output. 3) Put code you confirmed is vulnerable into a queue to perform taint analysis on, to see if it can be reached by attackers. Traditionally, I guess a fuzzer approached this from 3 -> 2, and there was no "stage 1". Because LLMs "understand" code, you can invert this system, and work if up from "understanding", i.e. approach it from the other side. You ask, given this code, is there a bug, and if so can we reach it?, instead of asking: given this public interface and a bunch of data we can stuff in it, does something happen we consider exploitable? | |||||||||||||||||||||||
| |||||||||||||||||||||||