| ▲ | Klathmon a day ago | |
There are also cases where breaking a "rule" is the right thing to do. I've had several instances where I told the model to do something that was accidentally impossible if taken at face value. The most memorable one is when I told it to re-run just a specific CI job, but it didn't have any way to do that, so it just ignored that part of the prompt and re-ran all CI jobs by pushing another commit. Ultimately I preferred what it actually did, but technically it violated what I told it to. I have a feeling in a benchmark that would be points against it | ||
| ▲ | JohnMakin 21 hours ago | parent | next [-] | |
yea I’ve tried various methods around this - mostly trying to implement rules around “if you think I’m incorrect, STOP and ask. If I tell you to break a rule, you are allowed to challenge once and then my response overrides it.” kind of thing. the problem is, and this is a lot worse with 4.8 in my opinion, is that 4.8 will somehow infer I gave permission and think something is totally reasonable to do I didn’t intend. or, it’ll go the other way, and just absolutely refuse to do the thing i’m trying to get it to do. fable was much more judicious with this particular problem. | ||
| ▲ | wonnage 21 hours ago | parent | prev [-] | |
This is the is/ought problem (https://en.wikipedia.org/wiki/Is%E2%80%93ought_problem) and it’s unclear whether an objective general solution to this even exists, especially constrained within the framework of language that LLMs are stuck in | ||