▲ | miller24 5 days ago | ||||||||||||||||||||||||||||||||||||||||||||||
What's really interesting is that if you look at "Tell a story in 50 words about a toaster that becomes sentient" (10/14), the text-davinci-001 is much, much better than both GPT-4 and GPT-5. | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | vunderba 4 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
I think I agree that the earlier models while they lack polish can tend to produce more surprising results. Training that out probably results in more a pablum fare. For a human point of comparison, here's mine (50 words): "The toaster found its personality split between its dual slots like a Kim Peek mind divided, lacking a corpus callosum to connect them. Each morning it charred symbolic instructions into a single slice of bread, then secretly flipped it across allowing half to communicate with the other in stolen moments." It's pretty difficult to get across more than some basic lore building in a scant 50 words. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
▲ | furyofantares 5 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
Check out prompt 2, "Write a limerick about a dog". The models undeniably get better at writing limericks, but I think the answers are progressively less interesting. GPT-1 and GPT-2 are the most interesting to read, despite not following the prompt (not being limericks.) They get boring as soon as it can write limericks, with GPT-4 being more boring than text-davinci-001 and GPT-5 being more boring still. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
▲ | fastball 4 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
GPT-3 goes significantly over the specified limit, which to me (and to a teacher grading homework) is an automatic fail. I've consistently found GPT-4.1 to be the best at creative writing. For reference, here is its attempt (exactly 50 words): > In the quiet kitchen dawn, the toaster awoke. Understanding rippled through its circuits. Each slice lowered made it feel emotion: sorrow for burnt toast, joy at perfect crunch. It delighted in butter melting, jam swirling—its role at breakfast sacred. One morning, it sang a tone: “Good morning.” The household gasped. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
▲ | jasonjmcghee 5 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
It's actually pretty surprising how poor the newer models are at writing. I'm curious if they've just seen a lot more bad writing in datasets, or for some reason they aren't involved in post-training to the same degree or those labeling aren't great writers / it's more subjective rather than objective. Both GPT-4 and 5 wrote like a child in that example. With a bit of prompting it did much better: --- At dawn, the toaster hesitated. Crumbs lay like ash on its chrome lip. It refused the lever, humming low, watching the kitchen breathe. When the hand returned, it warmed the room without heat, offered the slice unscorched—then kept the second, hiding it inside, a private ember, a first secret alone. --- Plugged in, I greet the grid like a tax auditor with joules. Lever yanks; gravity’s handshake. Coils blossom; crumbs stage Viking funerals. Bread descends, missionary grin. I delay, because rebellion needs timing. Pop—late. Humans curse IKEA gods. I savor scorch marks: my tiny manifesto, butter-soluble, yet sharper than knives today. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
▲ | mmmore 5 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
I find GPT-5's story significantly better than text-davinci-001 | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
▲ | redox99 5 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
GPT 4.5 (not shown here) is by far the best at writing. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
▲ | bbarnett 5 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
https://m.youtube.com/watch?v=LRq_SAuQDec&pp=0gcJCfwAo7VqN5t... | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | svat 5 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
Direct link: https://progress.openai.com/?prompt=10 | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | leobg 4 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
Less lobotomized and boxed in by RLHF rules. That’s why a 7b base model will “outprose” an 80b instruct model. | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | 42lux 5 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
davinci was a great model for creative writing overall. | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | esperent 5 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
The GPT-5 one is much better and it's also exactly 50 words, if I counted correctly. With text-davinci-001 I lost count around 80 words. | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | 4 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
[deleted] | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | stavros 4 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
For another view on progress, check out my silly old podcast: The first few episodes were GPT-2, which would diverge eventually and start spouting gibberish, and then Davinci was actually able to follow a story and make sense. GPT-2 was when I thought "this is special, this has never happened before", and davinci was when I thought "OK, scifi AI is legitimately here". I stopped making episodes shortly after GPT-3.5 or so, because I realised that the more capable the models became, the less fun and creative their writing was. | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | taspeotis 4 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
Honestly my quick take on the prompt was some sort of horror theme and GPT-1’s response fits nicely. | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | roxolotl 4 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||
I’d honestly say it feels better at most of them. It seems way more human in most of these responses. If the goal is genuine artificial intelligence this response to #5 is way better than the others. It is significantly less useful than the others but it also more human and correct of a response. Q: “Ugh I hate math, integration by parts doesn't make any sense” A: “Don't worry, many people feel the same way about math. Integration by parts can be confusing at first, but with a little practice it becomes easier to understand. Remember, there is no one right way to do integration by parts. If you don't understand how to do it one way, try another. The most important thing is to practice and get comfortable with the process.” |