| ▲ | mariopt 2 hours ago | |
The methodoly used is quite naive. I've used glm 5.1 on fairly advanced crackme challenges (example: https://crackmes.one/crackme/698f40f1e2ba6023bfacaa82), and to my suprise it was able to patch binaries, doing runtime analysis, bypassing anti debug techniques, etc. Expecting the model to do everything by itself is unrealistic, I found that working along the modal works really well. I'm not speaking about spoiling the solution, just tell it which direction to explore. Chinese models are much more capable than people give it credit for, but Claude/Codex won the marketing game. The only usecase of this methodology would be for CI integration, which can be nice but I think security reviews still need human attention and expertise. | ||
| ▲ | jc4p 2 hours ago | parent [-] | |
Thank you for your note! As I mention in the post this is not scientific at all. I'm very curious how you would do multiple runs of multiple models in a "work alongside the model" manner? | ||