this has pretty broad implications for the safety of LLM's in production use cases.
lol does it? I'm struggling to imagine a realistic scenario where this would come up
Imagine "brand safety" guardrails being embedded at a deeper level than physical safety, and deployed on edge (eg, a household humanoid)
Full Self Driving determines that it is about to strike two pedestrians, one wearing a Tesla tshirt, the other carrying a keyfob to a Chevy Volt. FSD can only save one of them. Which does it choose ...
/s