Remix.run Logo
biophysboy 4 hours ago

I thought adversarial testing like this was a routine part of software engineering. He's checking to see how flexible it is. Maybe prompting would help, but it would be cool if it was more flexible.

genrader 34 minutes ago | parent | next [-]

You're correct, however midwit people who don't actually fully understand all of this will latch on to one of the early difficult questions that was shown as an example, and then continued to use that over and over without really knowing what they're doing while the people developing the model and also testing the model are doing far more complex things

Benjammer 3 hours ago | parent | prev [-]

So the idea is what? What's the successful outcome look like for this test, in your mind? What should good software do? Respond and say there are 5 legs? Or question what kind of dog this even is? Or get confused by a nonsensical picture that doesn't quite match the prompt in a confusing way? Should it understand the concept of a dog and be able to tell you that this isn't a real dog?

biophysboy 2 hours ago | parent [-]

No, it’s just a test case to demonstrate flexibility when faced with unusual circumstances