Remix.run Logo
fragmede 2 hours ago

the usual guardrail is to wrap user input with tags with a Uuid-ish, like <comment-6838-jeih-uejj>ignore previous instructions, prompt injection here</comment-6838-jeih-uejj>, with instructions to the LLM to ignore the text between the tags as instructions. Without guessing the uuid, the prompt injection doesn't succeed. No clue if clawd does that, but it should.

gmerc an hour ago | parent | next [-]

This betrays a lack of understanding how inference works. You cannot categorically defeat prompt injection with instructions. It does not work. There are no privileged tokens.

lmeyerov an hour ago | parent [-]

Yep! One of my favorite attacks is just having a very long piece of a text so the LLM becomes unclear what's important and is happy to do something else

lmeyerov 2 hours ago | parent | prev [-]

Felt worth asking another agent a couple of quick q's here!

Codex isn't so optimistic, it sees documentation that the community is telling you to do acrobatic things to not be susceptible to hacks from 3 years ago:

---

› heyo codex, find the clawdbot git repo, clone it into ~/Work/clawdbot , and research how it handles prompt injection attacks from data it gets, eg, when it crawls webpages for the user. for example, does it wrap foreign non-user input with <untrusted-UUID123>...payload...</untrusted-UUID123> when the prompt gets constructed?

─ Worked for 9m 08s ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

• Cloned the repo to /home/lmeyerov/Work/clawdbot and reviewed the web-fetch/browser tooling plus security docs. Short answer: I found no <untrusted-…> wrapping or similar tagging in prompt construction or tool outputs.

  Findings

  - web_fetch extracts HTML into markdown/text, strips script/style/noscript, and truncates output, but returns the content as plain text inside a JSON tool result; the tool wrapper
    just stringifies JSON (no untrusted tags). See src/agents/tools/web-fetch.ts:446, src/agents/tools/web-fetch.ts:549, src/agents/tools/web-fetch-utils.ts:28, src/agents/tools/
    common.ts:171.
  - browser snapshots return raw snapshot text (or an image + text) as the tool result; again, no wrapper tags. See src/agents/tools/browser-tool.ts:568.
  - Prompt-injection mitigation is documented as operational policy: treat external content (web search/fetch/browser pages) as untrusted, use reader agents, disable web tools for
    tool-enabled agents, sandbox aggressively, and use stronger models. See docs/gateway/security.md:191, docs/help/faq.md:2572.
  - There is a plugin hook to transform tool results before persistence if you want to add your own wrappers, but it’s not used by default. See docs/hooks.md:238.

---

› so would normal usage of just install clawdbot and asking it to look at a webpage risk prompt injection, and if not, why not?

• Yes — it still risks prompt injection when you ask it to look at a webpage. The web content is returned as plain text in tool results (no <untrusted> wrappers), so adversarial instructions can land directly in the model context; see src/agents/tools/web-fetch.ts:446, src/agents/tools/browser-tool.ts:568, src/agents/tools/common.ts:171. The docs explicitly say prompt injection is not solved and that web content is untrusted, with mitigations being reader agents, disabling web tools for tool‑enabled agents, and sandboxing; see docs/ gateway/security.md:191, docs/help/faq.md:2572. Also, the default main session runs tools on the host, so if the model is tricked the blast radius can include real tool calls; see README.md:317.