Remix.run Logo
littlestymaar 7 hours ago

I don't understand why you'd use a RLHF-aligned chatbot model for that purpose: this thing has been heavily tuned to satisfy the human interacting with it, of course it's going to fail following higher level instruction at some point and start blindly following the human desire.

Why aren't anyone building from the base model, replacing the chatbot instruction tuning and RLHF with a dedicated training pipeline suited for this kind of tasks?