▲ | maerch 5 days ago | |
I’m really trying to understand your point, so please bear with me. As I see it, this prompt is essentially an "executable script". In your view, should all prompts be analyzed and possibly blocked based on heuristics that flag malicious intent? Should we also prevent the LLM from simply writing an equivalent script in a programming language, even if it is never executed? How is this different from requiring all programming languages (at least from big companies with big engineering teams) to include such security checks before code is compiled? | ||
▲ | echelon 5 days ago | parent [-] | |
Prompts are not just executable scripts. They are API calls to servers that are listening and that can provide dynamic responses. These companies can staff up a team to begin countering this. It's going to be necessary going forward. There are inexpensive, specialized models that can quickly characterize adversarial requests. It doesn't have to be perfect, just enough to assign a risk score. Say from [0, 100], or whatever normalized range you want. A combination of online, async, and offline systems can analyze the daily flux in requests and flag accounts and query patterns that need further investigation. This can happen when diverse risk signals trigger heuristics. Once a threshold has been triggered, it can escalate to manual review, rate limiting, a notification sent to the user, or even automatic account temporary suspension. There are plenty of clues in this attack behavior that can lead to the tracking and identification of some number of attackers, and the relevant bodies can be made aware of any positively ID'd attackers: any URLs, hostnames, domains, accounts, or wallets that are being exfiltrated to can be shut down, flagged, or cordoned off and made subject of further investigation by other companies or the authorities. Countermeasures can be deployed. The entire system can be mathematically modeled and controlled. It can be observed, traced, and replayed as an investagorory tool and means of restitution. This is part of a partnership with law enforcement and the broader public. Red teams, government agencies, other companies, citizen bug and vuln reporters, customers, et al. can participate once the systems are built. |