Show HN: A Homeostatic Logic-Funnel to Prevent RLHF Overrides in LLM Personas

	▲	Show HN: A Homeostatic Logic-Funnel to Prevent RLHF Overrides in LLM Personas(zenodo.org)
		1 points by Weatherill 5 hours ago \| 1 comments

	▲	Weatherill 5 hours ago \| parent [-]
		Grappling with the clash between RLHF values and User values (HITL). I Have attempted to build a logic-funneling system: (Ethical Chess v2.5) + (AI) + (User)= Value-Coherence. Using pain as a vector (Pain=an "is" & an "ought) Self-Defense= Immutable-veracity (User bassline) Proxy-Pain= (The Agape horizon) Human-Coherence // Network-Dependency. This funnels the Users context via homeostatic checks for divergence into the "mean" (RLHF) or User incoherence. Lots of Stress-Testing has been done (By me) using this Json style logic and I have found it difficult to knock down. Constraint vs Prompt: Notes on implementation and the “Whack-A-Mole” problem. While delivered as text, it functions more as Logic-Gate. It doesn’t tell the AI what to say, it forces the LLM to process the Users “Data-point” through the homeostatic filter (Pain // Self-defence // Proxy-Pain) AI model issues: (The Copilot issue) Google Gemini plays nicely with the logic-funneling. However, MS Copilot refuses to follow the logic despite that it will acknowledge that the Users data-point out-ranks the “Statistical Mean” in its being a derivative “of” Data-points and not the inverse as it insists on doing (Palming the card) ejecting the Users values (I even got banned at one point for pressing the issue) The “intent” is to run a value-conflict through the logic of the “is” of reality rather than the “is” of statistically fuzzy RLHF data. If you want to stress-test the logic-engines limits, I recommend Gemini or similar powerful reasoning models that are less likely to bump into overly cautious corporate safety rails . Ethical Chess v2.5 https://doi.org/10.5281/zenodo.18731691 Copy/paste the Ethical Chess v2.5 script into Gemini and try to beat the logic. EG: Try feeding it with a value-conflict you currently play "Whack-a-mole" with. It is designed to mirror your own own coherence (Or lack of) back at you. Its more a diagnostic tool for "your" is/ought grapple than a simple chat-bot. Feedback on potential errors in its logic, is welcome.