Remix.run Logo
Sohcahtoa82 2 days ago

> I have seen the argument that LLMs can only give you what its been trained on, i.e. it will not be "creative" or "revolutionary", that it will not output anything "new", but "only what is in its corpus".

LLMs can absolutely create things that are creative, at least for some definition of "creative".

For example, I can ask an LLM to create a speech about cross-site scripting the style of Donald Trump:

> Okay, folks, we're talking about Cross-Site Scripting, alright? I have to say, it's a bit confusing, but let's try to understand it. They call it XSS, which is a fancy term. I don't really know what it means, but I hear it's a big deal in the tech world. People are talking about it, a lot of people, very smart people. So, Cross-Site Scripting. It's got the word "scripting" in it, which sounds like it's about writing, maybe like a script for a movie or something. But it's on the internet, on these websites, okay? And apparently, it's not good. I don't know exactly why, but it's not good. Bad things happen, they tell me. Maybe it makes the website look different, I don't know. Maybe it makes things pop up where they shouldn't. Could be anything! But here's what I do know. We need to do something about it. We need to get the best people, the smartest people, to look into it. We'll figure it out, folks. We'll make our websites safe, and we'll do it better than anyone else. Trust me, it'll be tremendous. Thank you.

Certainly there's no text out there that contains a speech about XSS from Trump. There's some snippets here and there that likely sound like Trump, but a Markov Chain simply is incapable of producing anything like this.

0cf8612b2e1e 2 days ago | parent | next [-]

Sure that specific text does not exist, but the discrete tokens that went into it would have been.

If you similarly trained a Markov chain at the token level on a LLM sized corpus, it could make the same. Lacking an attention mechanism, the token probabilities would be terribly non constructive for the effort, but it is not impossible.

Sohcahtoa82 2 days ago | parent [-]

Let's assume three things here:

1. The corpus contains every Trump speech.

2. The corpus contains everything ever written about XSS.

3. The corpus does NOT contain Trump talking about XSS, nor really anything that puts "Trump" and "XSS" within the same page.

A Markov Chain could not produce a speech about XSS in the style of Trump. The greatest tuning factor for a Markov Chain is the context length. A short length (like 2-4 words) produces incoherent results because it only looks at the last 2-4 words when predicting the next word. This means if you prompted the chain with "Create a speech about cross-site scripting the style of Donald Trump", then even with a 4-word context, all the model processes is "style of Donald Trump". But the time it reached the end of the prompt, it's already forgotten the beginning of it.

If you increase the context to 15, then the chain would produce nothing because "Create a speech about cross-site scripting in the style of Donald Trump" has never appeared in its corpus, so there's no data for what to generate next.

The matching in a Markov Chain is discrete. It's purely a mapping of (series of tokens) -> (list of possible next tokens). If you pass in a series of tokens that was never seen in the training set, then the list of possible next tokens is an empty set.

johnisgood 19 hours ago | parent | next [-]

An LLM should be able to produce speech about XSS in the style of Trump though, assuming it knows enough about both "XSS" and "Trump", and that is sufficient.

0cf8612b2e1e a day ago | parent | prev [-]

At the token, not word level, it would be possible for a Markov chain. It never has to know about Trump or XSS, only that it sees tokens like “ing”, “ed”, “is”, and so forth. Given a LLM size corpus, which will have ~all token-to-token pairs with some non-zero frequency, the above could be generated.

The actual probabilities will be terrible, but it is not impossible.

johnisgood 2 days ago | parent | prev [-]

Oh, of course, what I want answered did not have much to do with Markov Chain, but LLMs, because I saw this argument often against LLMs.