| ▲ | bfeynman 2 hours ago | |
Given it was made by cognition (team behind devin flop) who now just got to wait out until claude and gpt5 basically do all of the work for them - not very. When you read about it, the framework is highly subjective. Which very quickly becomes a problem because its based on heuristics that probably change a bunch with a better code model. | ||
| ▲ | vanuatu 2 hours ago | parent [-] | |
the subjective framework is exactly why its good prior bms relied mostly on unit tests or synthetic judges which are easily benchmaxxed, which leads to nobody trusting benchmarks we need people manually checking the data for good code quality | ||