| ▲ | johndough a day ago | |
All the naysayer here have clearly no idea. Your large matrix multiplication implementation is quite impressive! I have set up a benchmark loop and let GPT-5.1-Codex-Max experiment for a bit (not 5.2/Opus/Gemini, because they are broken in Copilot), but it seems to be missing something crucial. With a bit of encouragement, it has implemented:
But yours is still easily 25 % faster. Would you be willing to write a bit about how you set up your evaluation and which tricks Claude used to solve it? | ||
| ▲ | josu 7 hours ago | parent [-] | |
Thank you. Yeah, I'm doing all those things, which do get you close to the top. The rest of things I'm doing are mostly micro-optimizations such as finding a way to avoid AVX→SSE transition penalty (1-2% improvement). But I don't want to spoil the fun. The agents are really good at searching the web now, so posting the tricks here is basically breaking the challenge. For example, chatGPT was able to find Matt's blog post regarding Task 1, and that's what gave me the largest jump: https://blog.mattstuchlik.com/2024/07/12/summing-integers-fa... Interestingly, it seems that Matt's post is not on the training data of any of the major LLMs. | ||