| ▲ | xlayn 6 hours ago | |
The other interesting point is that right now I'm copy pasting the layers, but a patch in llama.cpp can make the same model now behave better by a fact of simply following a different "flow" without needing more vram... if this is validated enough it can eventually lead to ship some kind of "mix" architecture with layers executed to fit some "vibe?" Devstral was the first one I tried and optimize for math/eq, but that din't result in any better model, then I added the reason part, and that resulted in "better" model I used the devstral with the vibe.cli and it look sharp to me, thing didn't fail, I also used the chat to "vibe" check it and look ok to me. The other thing is that I pick a particular circuit and that was "good" but I don't know if it was a local maxima, I think I ran just like 10 sets of the "fast test harness" and pick the config that gave the most score... once I have that I use that model and run it against the llm_eval limited to only 50 tests... again for sake of speed, I didn't want to wait a week to discover the config was bad | ||