| ▲ | DrProtic 3 hours ago | ||||||||||||||||
Seems like benchmark for how good a model is for vibe coding. Your prompt is extremely slim yet you score it on a bunch of features. | |||||||||||||||||
| ▲ | guilamu 3 hours ago | parent [-] | ||||||||||||||||
Yes, the prompt is slim by design. I might be wrong, but the point was to see what the model can do "on it's own". The eval prompt is quite extensive: https://github.com/guilamu/llms-wordpress-plugin-benchmark/b... | |||||||||||||||||
| |||||||||||||||||