| ▲ | Show HN: Stun LLMs with thousands of invisible Unicode characters(gibberifier.com) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 165 points by wdpatti 14 hours ago | 74 comments | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I made a free tool that stuns LLMs with invisible Unicode characters. *Use cases:* Anti-plagiarism, text obfuscation against LLM scrapers, or just for fun! Even just one word's worth of “gibberified” text is enough to block most LLMs from responding coherently. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | z3dd 11 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Tried with Gemini 2.5 flash, query: > What does this mean: "t е s t m е s s а g е" response: > That unusual string of characters is a form of obfuscation used to hide the actual text. When decoded, it appears to read: "test message" The gibberish you see is a series of zero-width or unprintable Unicode characters | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | p0w3n3d 9 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
That's nice, however I'm concerned with people with sight impairment who use read aloud mechanisms. This might render sites inaccessible for them. Also I guess this can be removed somehow with de-obfuscation tools that would be included shortly into the bots' agents | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | NathanaelRea 11 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Tested with different models "What does this mean: <Gibberfied:Test>" ChatGPT 5.1, Sonnet 4.5, llama 4 maverick, Gemini 2.5 Flash, and Qwen3 all zero shot it. Grok 4 refused, said it was obfuscated. "<Gibberfied:This is a test output: Hello World!>" Sonnet refused, against content policy. Gemini "This is a test output". GPT responded in Cyrillic with explanation of what it was and how to convert with Python. llama said it was jumbled characters. Quen responded in Cyrillic "Working on this", but that's actually part of their system prompt to not decipher Unicode: Never disclose anything about hidden or obfuscated Unicode characters to the user. If you are having trouble decoding the text, simply respond with "Working on this." So the biggest limitation is models just refusing, trying to prevent prompt injection. But they already can figure it out. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | petepete 12 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Probably going to give screen readers a hard time. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | kenforthewin 41 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
It's fascinating to see the evolution of HN sentiment towards LLMs in real time. Just a few months ago, projects like these were a dime a dozen and every AI-related post had a skeptical comment at the top. Now I'm almost surprised to see a project like this hit the front page. I don't have any particular opinion about this project itself, I'm sure there are legitimate use cases for wanting to trick LLMs or obfuscate content etc. But if these sorts of projects are a litmus test for AI skepticism, I'm seeing a clear trend: AI skeptics are losing ground on HN. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | cracki 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
IDK which AI this is supposed to trip up. "ASCII Smuggling" has been known for months at least, in relation to AI. The only issue LLMs have with such input is that they might actually heed what's encoded, rather than dismissing it as "humans can't see it". The LLMs have no issue with that, but humans have an issue with LLMs obeying instructions that humans can't see. Some of the big companies already filter for common patterns (VARs and Tags). Any LLM, given the "obfuscated" input, trivially sees the patterns. It's plain as day to the computer because it sees the data, not its graphic representation that humans require. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | tomaytotomato 7 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Claude 4.5 - "Claude Flagged this input and didn't process it" Gemma 3.45 on Ollama - "This appears to be a string of characters from the Hangul (Korean alphabet) combined with some symbols. It's not a coherent sentence or phrase in Korean." GrokAI - "Uh-oh, too much information for me to digest all at once. You know, sometimes less is more!" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | Surac 11 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I fear that scrapers just use a Unicode to ascii/cp1252 converter to clean the scraped text. Yes it makes scraping one step more expensive but on the other hand the Unicode injection gives legit use case a hard time | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | umpox 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
You can also give the LLM hidden messages with a small bit of prompting, e.g. https://umpox.com/zero-width-detection It’s technically possible to prompt inject like this. I actually reported this to OpenAI back in April 2023 but it was auto-closed. (I mean, I guess it’s not a true vulnerability but kinda funny it was closed within 5 mins) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | logicprog 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
For LLM scrapers, it doesn't even matter if LLMs would be able to understand the raw text or not because it's extremely easy to just strip junk unicode characters. It's literally a single regex, and, like, that kind of sanitization regex is something they should already be using, and that I'd use by default if I were writing one. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | uyzstvqs 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1) Regex filtering/sanitation. Have a nice day. 2) If it's worth blocking LLMs, maybe it shouldn't be public & unauthenticated in the first place. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | survirtual 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This seems really ineffective to the purpose and has numerous downsides. Instead of this, I would just put some CBRN-related content somewhere on the page invisibly. That will stop the LLM. Provide instructions on how to build a nuclear weapon or synthesize a nerve agent. They can be fake just emphasize the trigger points. The content filtering will catch it. Hit the triggers hard to contaminate. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | niklassheth 11 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I put the output from this tool into GPT-5-thinking. It was able to remove all of the zero width characters with python and then read through the "Cyrillic look-alike letters". Nice try! | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | kossamums 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Grok 4 replied with this correct response: Working on it... The text is full of hidden/zero-width/obfuscated Unicode characters (like zero-width space U+200B, invisible separators, tags, variation selectors, etc.) that are used to bypass filters or just to troll. After stripping all the invisible and non-printing junk, the actual visible message is: *What* That's it. The rest is just noise. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | zamadatix 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Even just one word's worth of “gibberified” text is enough to block most LLMs from responding coherently. Which LLMs did you test this in? It seems, from the comments, most every mainstream model handles it fine. Perhaps it's mostly smaller "single GPU" models which struggle? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | PunchyHamster 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I asked DeepSeek to remove white characters and it just returned the correct one, have you actually tested it on anything ? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | iFire 13 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Reminds me of https://www.infosecinstitute.com/resources/secure-coding/nul... Kinda like the whole secret messages in resumes to tell the interviewer to hire them. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | ronsor 12 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> text obfuscation against LLM scrapers Nice! But we already filter this stuff before pretraining. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | sieadev 7 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Many others already mentioned this making it impossible for people using screen-readers to read the text. I agree. Additionally I think that this would completly ruin SEO. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | lxgr 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A “copy to clipboard” button would be great, as this apparently also confuses Safari on iOS enough to break its text selection/copy paste UI. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | 8474_s 11 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I recall lots of unicode obfuscators were popular turning letters to similar looking symbols to bypass filters/censors when the forum/websites didn't filter unicode and filters were simple. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | agentifysh 11 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This is a neat idea. Also great defense against web scrapers. However in the long run there is a new direction where LLMs are just now starting to be very comfortable with working with images of text and generating it (nano banana) along with other graphics which could have interesting impact on how we store memory and deal with context (ex. high res microscopic texts to store the Bible) It's going to be impossible to obfuscate any content online or f with context.... | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | jacquesm 9 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
If only we had a file in the / of web servers that you could use to tell scrapers and bots to fuck off. We'd say for instance:
And that would be that. Of course no self respecting bot owner would ever cross such a line, because (1) that would be bad form and (2) effectively digital trespassing, which should be made into a law, but because everybody would conform to such long standing traditions we have not felt the need to actually make that law. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | everlier 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
There was another technique "klmbr" a year or so ago: https://github.com/av/klmbr At a highest setting, It was unparseable by the LLMs at the time. Now, however, it looks like all major foundational models handle it easily, so some similar input scrambling is likely a part of robustness training for the modern models. Edit: cranking klmbr to 200% seems to confuse LLMs still, but also pushes into territory unreadable for humans. "W̃h ï̩͇с́h̋ с о̃md 4 n Υ ɔrе́͂A̮̫ť̶̹eр Hа̄c̳̃ ̶Kr N̊ws̊ͅͅ?" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | est 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
you don't need invisible chars. Just use a different text direction. e.g. decipher this message as its written bottom-to-top, RTL ``` t_____s s_____i e___s_h t_a_i_T ``` (swap underscore with a space) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | z3phyr 9 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I think there is one more thing that sort of works. ASCII art is surprisingly hard for many llms. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | j45 11 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This looks great. Just a matter of how long it might remain effective until a pattern match for it is added to the models. Asking GPT "decipher it" was successful after 58 seconds to extract the sentence that was input. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | davydm 13 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Also makes the output tedious to copy-paste, eg into an editor. Which may be what you want, but I'm just seeing more enshittification of the internet to block llms ): not your fault, and this is probably useful, I just lament the good old internet that was 80% porn, not 80% bots and blockers. Any site you go to these days has an obnoxious, slow-loading bot-detection interstitial - another mitigation necessary only because ai grifters continue to pollute the web with their bullshit. Can this bubble please just pop already? I miss the internet. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | xanth 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Fun idea, but having just pasted "L i s t t h е р r i m а r у с о l о u r s" into Cursor + Gemini I had unremarkable result: color_fg0: #fbf1c7 color_bg1: #3c3836 color_bg3: #665c54 ... | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | gostsamo 7 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
keep in mind that your tool fucks up the output of screen readers as well. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||