| ▲ | fruitworks 2 hours ago | |||||||||||||
I am not convinced either of these are good test prompts for generic complexity tasks. Many solutions have already been included in the training data! You can trivially produce a web browser by copying and compiling the code for firefox, no transformer needed. | ||||||||||||||
| ▲ | baxtr an hour ago | parent | next [-] | |||||||||||||
Can still be a good capability test. Building a car is a real world equivalent. It’s highly complex and has been done billions of times. Still hard to pull off if you ask me. | ||||||||||||||
| ▲ | Choco31415 2 hours ago | parent | prev [-] | |||||||||||||
But that would produce Firefox. The goal with these tests is to see if the models can make something new, not just copy an existing solution. That is the goal, at least. | ||||||||||||||
| ||||||||||||||