▲ | Tenoke a day ago | ||||||||||||||||
>Complaint chat models will be trained to start with "Certainly! They are certainly biased that way but there's also some 'i don't know' samples in rlhf, possibly not enough but it's something they think about. At any rate, Gemini 2.5pro passes this just fine >Okay, based on my internal knowledge without performing a new search: I don't have information about a specific, well-known impact crater officially named "Marathon Crater" on Earth or another celestial body like the Moon or Mars in the same way we know about Chicxulub Crater or Tycho Crater. >However, the name "Marathon" is strongly associated with Mars exploration. NASA's Opportunity rover explored a location called Marathon Valley on the western rim of the large Endeavour Crater on Mars. | |||||||||||||||||
▲ | thatjoeoverthr 21 hours ago | parent [-] | ||||||||||||||||
There are a few problems with an „I don’t know” sample. For starters, what does it map to? Recall, the corpus consists of information we have (affirmatively). You would need to invent a corpus of false stimuli. What you would have, then, is a model that is writing „I don’t know” based on whether the stimulus better matches something real, or one of the negatives. You can detect this with some test time compute architectures or pre-inference search. But that’s the broader application. This is a trick for the model alone. | |||||||||||||||||
|