| ▲ | Making frontier cybersecurity capabilities available to defenders(anthropic.com) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 76 points by surprisetalk 4 hours ago | 28 comments | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | baby 21 minutes ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
As a founder of an auditing firm, I definitely feel the heat of the competition when big LLM companies push products that not only compete with us an auditors but also with our own AI-based offerings (https://zkao.io/). If I were to venture a guess, there's different world in which we might exist in the next 5-10 years. In one of these futures, we, as auditors, seize to exist. If this is the future, then developers seize to exist too, and most people touching software seize to exist. My guess here is as good as any developer's guess on if their job will remain stable. In another one of these futures, us auditors become more specialized, more niche, and bring the "human touch" needed or required. Serious companies will want to continue working with some humans, and delegating security to "someone". That someone could be embedded in the company, or they could be a SaaS+human-support system like zkao. On the other hand, vibe coders will definitely use claude code security, maybe we should call it "vibe security"? I don't mean it as a diss, I vibe code myself, but it will most likely be as good as vibe coding in the sense that you might have to spend time understanding it, it might make a lot of mistakes, and it will be "good enough" for a lot of usecases. I think that world is a bit more realistic today, than the AGI "all of our jobs are gone in the next years" doom claim. And as @zksecurityXYZ , I don't think we're too scared of that world. These tools have been, and are making us stronger auditors. We're a small, highly specialized team, that's resilient and hard to replace. On the other hand large consultancies and especially consultancies that focus on low hanging fruits like web security and smart contracts are ngmi. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | ievans 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Not super surprising that Anthropic is shipping a vulnerability detection feature -- OpenAI announced Aardvark back in October (https://openai.com/index/introducing-aardvark/) and Google announced BigSleep in Nov 2024 (https://cloud.google.com/blog/products/identity-security/clo...). The impact question is really around scale; a few weeks ago Anthropic claimed 500 "high-severity" vulnerabilities discovered by Opus 4.6 (https://red.anthropic.com/2026/zero-days/). There's been some skepticism about whether they are truly high severity, but it's a much larger number than what BigSleep found (~20) and Aardvark hasn't released public numbers. As someone who founded a company in the space (Semgrep), I really appreciated that the DARPA AIxCC competition required players using LLMs for vulnerability discovery to disclose $cost/vuln and the confusion matrix of false positives along with it. It's clear that LLMs are super valuable for vulnerability discovery, but without that information it's difficult to know which foundation model is really leading. What we've found is that giving LLM security agents access to good tools (Semgrep, CodeQL, etc.) makes them significantly better esp. when it comes to false positives. We think the future is more "virtual security engineer" agents using tools with humans acting as the appsec manager. Would be very interested to hear from other people on HN who have been trying this approach! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | upghost 4 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Anakin: I'm going to save the world with my AI vulnerability scanner, Padme. Padme: You're scanning for vulnerabilities so you can fix them, Anakin? Anakin: ... Padme: You're scanning for vulnerabilities so you can FIX THEM, right, Annie? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | sanketsaurav 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
FWIW Claude Code Opus 4.5 ranks ~71% accuracy on the OpenSSF CVE Benchmark that we ran against DeepSource (https://deepsource.com/benchmarks). We have a different approach, in that we're using SAST as a fast first pass on the code (also helps ground the agent, more effective than just asking the model to "act like a security researcher"). Then, we're using pre-computer static analysis artifacts about the code (like data flow graphs, control flow graphs, dependency graphs, taint sources/sinks) as "data sources" accessible to the agent when the LLM review kicks in. As a result, we're seeing higher accuracy than others. Haven't gotten access to this new feature yet, but when we do we'd update our benchmarks. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | nadis 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> "Rather than scanning for known patterns, Claude Code Security reads and reasons about your code the way a human security researcher would: understanding how components interact, tracing how data moves through your application, and catching complex vulnerabilities that rule-based tools miss." Fascinating! Our team has been blending static code analysis and AI for a while and think it's a clever approach for the security use case the Anthropic team's targeting here. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | bink 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I hope this is better than their competitors products. So far I've been underwhelmed. They basically just find stuff that's already identified by static analysis tooling and toss in a bunch of false positives from the AI scans. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | david_shaw 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
There's a lot of skepticism in the security world about whether AI agents can "think outside the box" enough to replicate or augment senior-level security engineers. I don't yet have access to Claude Code Security, but I think that line of reasoning misses the point. Maybe even the real benefit. Just like architectural thinking is still important when developing software with AI, creative security assessments will probably always be a key component of security evaluation. But you don't need highly paid security engineers to tell you that you forgot to sanitize input, or you're using a vulnerable component, or to identify any of the myriad issues we currently use "dumb" scanners for. My hope is that tools like this can help automate away the "busywork" of security. We'll see how well it really works. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | vimda an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I would love to know how this compares to just prompting Claude Code with "please find and fix any security vulnerabilities in this code" | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | drcongo 4 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I thought they'd noticed how many of my Claude tokens I've been burning trying to build defences against the AI bot swarms. Sadly not. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | grolly 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Limited preview for researchers, who will be hand picked to write positive reviews. Enough of this frontier grifting. Make it testable for open source developers at no cost and without login or get lost. You won't of course, because you'd get an unfiltered evaluation instead of guerilla marketing via blog posts, secrecy, and name-dropping researchers that cannot be disclosed. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | deadbabe 4 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Solve a problem and everyone praises you. No one knows you also caused that problem. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||