Remix.run Logo
perlgeek 5 days ago

GPT-5 simply sucks at some things. The very first thing I asked it to do was to give me an image of knife with spiral damascus pattern, it gave me an image of such a knife, but with two handles at a right angle: https://chatgpt.com/share/689506a7-ada0-8012-a88f-fa5aa03474...

Then I asked it to give me the same image but with only one handle; as a result, it removed one of the pins from a handle, but the knife had still had two handles.

It's not surprising that a new version of such a versatile tool has edge cases where it's worse than a previous version (though if it failed at the very first task I gave it, I wonder how edge that case really was). Which is why you shouldn't just switch over everybody without grace period nor any choice.

The old chatgpt didn't have a problem with that prompt.

For something so complicated it doesn't surprise that a major new version has some worse behaviors, which is why I wouldn't deprecate all the old models so quickly.

zaptrem 5 days ago | parent | next [-]

The image model (GPT-Image-1) hasn’t changed

orphea 5 days ago | parent | next [-]

Yep, GPT-5 doesn't output images: https://platform.openai.com/docs/models/gpt-5

perlgeek 5 days ago | parent | prev [-]

Then why does it produce different output?

simonw 5 days ago | parent | next [-]

It works as a tool. The main model (GPT-4o or GPT-5 or o3 or whatever) composes a prompt and passes that to the image model.

This means different top level models will get different results.

You can ask the model to tell you the prompt that it used, and it will answer, but there is no way of being 100% sure it is telling you the truth!

My hunch is that it is telling the truth though, because models are generally very good at repeating text from earlier in their context.

slickytail 5 days ago | parent [-]

Source for this? My understanding was that this was true for dalle3, but that the autoregressive image generation just takes in the entire chat context — no hidden prompt.

simonw 5 days ago | parent [-]

Look at the leaked system prompts and you'll see the tool definition used for image generation.

slickytail 5 days ago | parent [-]

I stand corrected! Thanks.

seba_dos1 5 days ago | parent | prev [-]

You know that unless you control for seed and temperature, you always get a different output for the same prompts even with the model unchanged... right?

carlos_rpn 5 days ago | parent | prev | next [-]

Somehow I copied your prompt and got a knife with a single handle on the first try: https://chatgpt.com/s/m_689647439a848191b69aab3ebd9bc56c

Edit: chatGPT translated the prompt from english to portuguese when I copied the share link.

hirvi74 5 days ago | parent | next [-]

I think that is one of the most frustrating issues I currently face when using LLMs. One can send the same prompt in two separate chats and receive two drastically different responses.

dymk 5 days ago | parent [-]

It is frustrating that it’ll still give a bad response sometimes, but I consider the variation in responses a feature. If it’s going down the wrong path, it’s nice to be able to roll the dice again and get it back on track.

techpineapple 5 days ago | parent | prev [-]

I’ve noticed inconsistencies like this, everyone said that it couldn’t count the b’s in blueberry, but it worked for me the first time, so I thought it was haters but played with a few other variations and got flaws. (Famously, it didn’t get r’s in strawberry).

I guess we know it’s non-deterministic but there must be some pretty basic randomizations in there somewhere, maybe around tuning its creativity?

seba_dos1 5 days ago | parent [-]

Temperature is a very basic concept that makes LLMs work as well as they do in the first place. That's just how it works and that's how it's been always supposed to work.

vunderba 5 days ago | parent | prev | next [-]

To ensure that GPT-5 funnels the image to the SOTA model `gpt-image-1`, click the Plus Sign and select "Create Image". There will still be some inherent prompt enrichment likely happening since GPT-5 is using `gpt-image-1` as a tool. Outside of using the API, I'm not sure there is a good way to avoid this from happening.

Prompt: "A photo of a kitchen knife with the classic Damascus spiral metallic pattern on the blade itself, studio photography"

Image: https://imgur.com/a/Qe6VKrd

5 days ago | parent | prev | next [-]
[deleted]
joaohaas 5 days ago | parent | prev | next [-]

Yes, it sucks

But GPT-4 would have the same problems, since it uses the same image model

chrismustcode 5 days ago | parent | prev | next [-]

The image model is literally the same model

minimaxir 5 days ago | parent | prev [-]

So there may be something weird going on with images in GPT-5, which OpenAI avoided any discussion about in the livestream. The artist for SMBC noted that GPT-5 was better at plagiarizing his style: https://bsky.app/profile/zachweinersmith.bsky.social/post/3l...

However, there have been no updates to the underlying image model (gpt-image-1). But due to the autoregressive nature of the image generation where GPT generates tokens which are then decoded by the image model (in contrast to diffusion models), it is possible for an update to the base LLM token generator to incorporate new images as training data without having to train the downstream image model on those images.

simonw 5 days ago | parent [-]

No, those changes are going to be caused by the top level models composing different prompts to the underlying image models. GPT-5 is not a multi-modal image output model and still uses the same image generation model that other ChatGPT models use, via tool calling.

GPT-4o was meant to be multi-modal image output model, but they ended up shipping that capability as a separate model rather than exposing it directly.

minimaxir 5 days ago | parent [-]

That may be a more precise interpretation given the leaked system prompt, as the schema for the tool there includes a prompt: https://news.ycombinator.com/item?id=44832990