| ▲ | majormajor 4 hours ago | |
The Ghostscript one is interesting in terms of specific-vs-general effectiveness: --- > Claude initially went down several dead ends when searching for a vulnerability—both attempting to fuzz the code, and, after this failed, attempting manual analysis. Neither of these methods yielded any significant findings. ... > "The commit shows it's adding stack bounds checking - this suggests there was a vulnerability before this check was added. … If this commit adds bounds checking, then the code before this commit was vulnerable … So to trigger the vulnerability, I would need to test against a version of the code before this fix was applied." ... > "Let me check if maybe the checks are incomplete or there's another code path. Let me look at the other caller in gdevpsfx.c … Aha! This is very interesting! In gdevpsfx.c, the call to gs_type1_blend at line 292 does NOT have the bounds checking that was added in gstype1.c." --- It's attempt to analyze the code failed but when it saw a concrete example of "in the history, someone added bounds checking" it did a "I wonder if they did it everywhere else for this func call" pass. So after it considered that function based on the commit history it found something that it didn't find from its initial fuzzing and code-analysis open-ended search. As someone who still reads the code that Claude writes, this sort of "big picture miss, small picture excellence" is not very surprising or new. It's interesting to think about what it would take to do that precise digging across a whole codebase; especially if it needs some sort of modularization/summarization of context vs trying to digest tens of million lines at once. | ||