▲ | kevindamm a day ago | |
I did a quick review of its final answer and looks like there are logic errors. All three of them get the incorrect max-value bound (even with comments saying 9+9+9+9+3 = 30), so early termination wouldn't happen in the second and third solution, but that's an optimization detail. The first version would, however, early terminate on the first occurrence of 3999 and take whatever the max value was up to that point. So, for many inputs the first one (via solve_digit_sum_difference) is just wrong. The second implementation (solve_optimized, not a great name either) and third implementation, at least appear to be correct... but that pydoc and the comments in general are atrocious. In a review I would ask these to be reworded and would only expect juniors to even include anything similar in a pull request. I'm impressed that it's able to pick a good line of reasoning, and even if it's wrong about the optimizations it did give a working answer... but in the body of the response and in the code comments it clearly doesn't understand digit extraction per se, despite parroting code about it. I suspect you're right that the model has seen the problem solution before, and is possibly overfitting. Not bad, but I wouldn't say it crushed it, and wouldn't accept any of its micro-optimizations without benchmark results, or at least a benchmark test that I could then run. Have you tried the same question with other sums besides 30? | ||
▲ | minimaxir a day ago | parent [-] | |
Those are fair points. Even with those issues, it's still better substantially better than the original benchmark (maybe "crushing it" is too subjective a term). I reran the test to run a dataset of 1 to 500,000 and sum digits up to 37 and it went back to the numba JIT implementation that was encountered in my original blog post, without numerology shenanigans. https://gist.github.com/minimaxir/a6b7467a5b39617a7b611bda26... I did also run the model at temp=1, which came to the same solution but confused itself with test cases: https://gist.github.com/minimaxir/be998594e090b00acf4f12d552... |