| ▲ | catigula 3 days ago |
| How are you determining that it's better? Care to make a case for it that isn't benchmark (gameable) based? |
|
| ▲ | scrollaway 3 days ago | parent [-] |
| By that metric, everything is gameable. Any case we'd make for it would be purely based on vibes (and our take on that would not be any more useful than the general community opinion there). |
| |
| ▲ | yunwal 3 days ago | parent | next [-] | | > By that metric, everything is gameable Usually in cases like this you would use a testing set created after the model was trained. | |
| ▲ | catigula 3 days ago | parent | prev [-] | | So the answer would be no. | | |
| ▲ | scrollaway 3 days ago | parent [-] | | A benchmark is exactly how you measure things reliably instead of "based on vibes". I really don't understand what you're asking or expecting. |
|
|