Remix.run Logo
techpineapple 6 days ago

I wonder if one reasons new versions of GPT appear to get better - say at coding tasks is just because they have new knowledge.

When ChatGPT4 comes out, new versions of API’s will have less blog post / examples / documentation in their training data. So ChatGPT 5 comes out and seems to solve all the problems that ChatGPT4 had, but then of course fail on newer libraries. Rinse and repeat

its-kostya 6 days ago | parent [-]

> ... just because they have new knowledge.

This means there is a future where AI is training on data it self generated, and I worry that might not be sustainable.

jgalt212 6 days ago | parent | next [-]

A software based Habsburg Jaw if you will.

techpineapple 6 days ago | parent | prev | next [-]

I’ve heard of this idea of training on synthetic data, I wonder what is that data and does this increase or decrease hallucinations? Is the goal of training on synthetic data to better wear certain paths, or to increase the amount of knowledge / types of data.

Because the second seems vaguely impossible to do.

lazide 6 days ago | parent | prev [-]

This is already occurring, is not sustainable, and produces an effect known as Model Collapse.