Remix.run Logo
Show HN: PII-Shield – Log Sanitization Sidecar with JSON Integrity (Go, Entropy)(github.com)
14 points by aragoss 6 hours ago | 8 comments

What PII-Shield does: It's a K8s sidecar (or CLI tool) that pipes application logs, detects secrets using Shannon entropy (catching unknown keys like "sk-live-..." without predefined patterns), and redacts them deterministically using HMAC.

Why deterministic? So that "pass123" always hashes to the same "[HIDDEN:a1b2c]", allowing QA/Devs to correlate errors without seeing the raw data.

Key features: 1. JSON Integrity: It parses JSON, sanitizes values, and rebuilds it. It guarantees valid JSON output for your SIEM (ELK/Datadog). 2. Entropy Detection: Uses context-aware entropy analysis to catch high-randomness strings. 3. Fail-Open: Designed as a transparent pipe wrapper to preserve app uptime.

The project is open-source (Apache 2.0).

Repo: https://github.com/aragossa/pii-shield Docs: https://pii-shield.gitbook.io/docs/

I'd love your feedback on the entropy/threshold logic!

aragoss 3 hours ago | parent | next [-]

Update: Seeing some folks pulling the Docker image. Just a heads up — the default entropy threshold is 3.8, which is tuned for API keys. If you are testing with simple words like 'test', it might not catch them (by design). Check the README for tweaking PII_ENTROPY_THRESHOLD.

gritspants 21 minutes ago | parent | prev | next [-]

Thanks for this. Showed some of my colleagues who thought it was cool also.

maxbond 3 hours ago | parent | prev [-]

Cool project!

- Wouldn't this censor UUIDs? I want UUIDs to remain in my logs.

- The never "PII Shield" makes me think this would censor entities like names or social security numbers, rather than secrets. Not a big deal though.

aragoss 3 hours ago | parent [-]

Thanks!

UUIDs: By default—no. Since UUIDs are Hex (limited charset 0-f), they have lower entropy than Base64 secrets. The threshold is tuned to sit safely above UUIDs but below API keys.

Naming: You are totally right. Currently, it focuses on "high-entropy PII" (passwords, auth tokens, session IDs) rather than names or SSNs. "Secrets Shield" might have been more precise, but naming is hard :)

hangonhn 3 hours ago | parent [-]

So depending on the context UUID can be PII. Is this something we can customize or adjust?

aragoss 3 hours ago | parent [-]

Yes, absolutely.

You can fine-tune the sensitivity via the PII_ENTROPY_THRESHOLD environment variable.

If you consider UUIDs to be sensitive in your context (or if you are getting false positives), you can adjust the threshold. For example, standard UUIDs have lower entropy density than API keys, so slightly tuning the value (e.g. from 3.8 to 3.2 or vice-versa) allows you to draw the line where you need it.

hangonhn 2 hours ago | parent [-]

Is there a way to tell it to just recognize UUIDs and redact those without adjusting the threshold? In our case, UUIDs is just an exception. I think all the other stuff you're doing is correct for our situation.

aragoss 2 hours ago | parent [-]

Currently, no — the scanner focuses on entropy and specific Key Names, not value patterns (Regex).

However, if your UUIDs live in consistent fields (e.g., request_id, trace_token, uuid), you can add those field names to the Sensitive Keys list. This forces redaction for those specific fields regardless of their entropy score, while keeping the global threshold high for everything else.

That said, "Redact by Value Regex" (to catch UUIDs anywhere) is a great idea. I'll add it to the backlog.