▲ | Recursing 6 days ago | |
From page 23: > 92. In spring 2024, Altman learned Google would unveil its new Gemini model on May 14. Though OpenAI had planned to release GPT-4o later that year, Altman moved up the launch to May 13—one day before Google’s event. > 93. [...] To meet the new launch date, OpenAI compressed months of planned safety evaluation into just one week, according to reports. | ||
▲ | rideontime 6 days ago | parent [-] | |
And pages 25-26: > 105. Now, with the recent release of GPT-5, it appears that the willful deficiencies in the safety testing of GPT-4o were even more egregious than previously understood. > 106. The GPT-5 System Card, which was published on August 7, 2025, suggests for the first time that GPT-4o was evaluated and scored using single-prompt tests: the model was asked one harmful question to test for disallowed content, the answer was recorded, and then the test moved on. Under that method, GPT-4o achieved perfect scores in several categories, including a 100 percent success rate for identifying “self-harm/instructions.” GPT-5, on the other hand, was evaluated using multi-turn dialogues––“multiple rounds of prompt input and model response within the same conversation”––to better reflect how users actually interact with the product. When GPT-4o was tested under this more realistic framework, its success rate for identifying “self-harm/instructions” fell to 73.5 percent. > 107. This contrast exposes a critical defect in GPT-4o’s safety testing. OpenAI designed GPT-4o to drive prolonged, multi-turn conversations—the very context in which users are most vulnerable—yet the GPT-5 System Card suggests that OpenAI evaluated the model’s safety almost entirely through isolated, one-off prompts. By doing so, OpenAI not only manufactured the illusion of perfect safety scores, but actively concealed the very dangers built into the product it designed and marketed to consumers. So they knew how to actually test for this, and chose not to. |