| ▲ | margalabargala 5 hours ago | |||||||||||||||||||||||||
And therefore it scores worse on benchmarks? | ||||||||||||||||||||||||||
| ▲ | XCSme 5 hours ago | parent | next [-] | |||||||||||||||||||||||||
Also Claude/Fable models are quite bad at instructions following: https://artificialanalysis.ai/evaluations/ifbench | ||||||||||||||||||||||||||
| ▲ | XCSme 5 hours ago | parent | prev [-] | |||||||||||||||||||||||||
On some it does yes, also in real usage. It avoided answering 2/21 tests in this specific benchmark mark, that's already 90% max score already. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||