| ▲ | Aurornis 4 hours ago | |
> A simple linear combination of every weight did not degrade the performance of the model, but enhanced it. Enhanced it on a couple benchmarks, supposedly. The game is to turn knobs until you get a benchmark run that shows an improvement, then ship it. There are a lot of fine tunes and chimera models on HuggingFace that are supposedly better at some specific test, but when you use them for anything else they're usually worse. This happens with a lot of the models that are modified to remove censorship. They succeed in getting the model to emit previously censored outputs, but the overall output quality decreases. | ||
| ▲ | andai 4 hours ago | parent | next [-] | |
They seem to have deleted most of the README now, but the archived version has benchmarks. https://web.archive.org/web/20260614082641/https://huggingfa... And the Nex benchmarks for comparison https://huggingface.co/nex-agi/Nex-N2-Pro Rio seems to be about halfway between Qwen 3.5 and Nex, as you'd expect? | ||
| ▲ | monster_truck an hour ago | parent | prev | next [-] | |
I don't think your last point is correct. Ablation, when done correctly, seems to increase the quality and typically also the performance too. | ||
| ▲ | manquer 2 hours ago | parent | prev [-] | |
> game is to turn knobs until you get a benchmark run that shows an improvement, then ship it i.e reinforcement learning against a weak reward function - benchmark is insufficiently complex and is not representative of the real world sufficiently. The "game", i.e. decision tree can be modeled as a multi-arm bandit problem, to deploy finite resources ( compute) toward exploitation/exploration . The main issue is each training / fine-tune is very expensive so number of chances at the slot so to speak is pretty limited today. | ||