| ▲ | NitpickLawyer 4 hours ago | |
Agreed. Still tough, but my point was that we're starting to see that combining methods works. The models are now good enough to create rubrics for judgement stuff. Once you have rubrics you have better judgements. The models are also better at taking pages / chapters from books and "judging" based on those (think logic books, etc). The key is that capabilities become additive, and once you unlock something, you can chain that with other stuff that was tried before. That's why test time + longer context -> IMO improvements on stuff like theorem proving. You get to explore more, combine ideas and verify at the end. Something that was very hard before (i.e. very sparse rewards) becomes tractable. | ||