| ▲ | solid_fuel an hour ago | |||||||
> assuming you get to do gradient descent AND the context is fixed+known AND you have unlimited compute? sure; is it a realistic setup? Clearly nothing so complicated is required, given the prompt in the very article you are commenting on. > the exact same argument applies to any (sufficiently complex) piece of software, with exactly the same conclusion Yeah and the halting problem is hard too, but there's levels to this shit. > also technically I'd argue that we do know the input/output space (set of all token strings of length <= N/token), and know the mapping (the model is a ~pure function in terms of the api, which is about as good of a representation as it gets for a non-invertible mapping); at least it's much closer than with something like linux I would argue we don't even know the desired output for most inputs for an LLM and they certainly aren't trained on every possible input state. But I think Linux and LLMs are sufficient different that they aren't really directly comparable like this. After all, Linux is not a pure function and has lots of side effects. But just to establish an order of magnitude: the input space for ChatGPT 3.0 was 2,048 tokens long. There were 50,257 tokens in the vocabulary. The input space thus has 50,257^(2048) unique states, which is approximately equal to 1.12 × 10^9628. That's an awful big input space for a single function. | ||||||||
| ▲ | anuramat an hour ago | parent [-] | |||||||
> clearly nothing ... is required this isn't even prompt injection; even if it was, how do you go from "exists" to "for all"? > we don't know the desired output then what are we talking about? if you don't know how you want your software to behave, how do you define a bug? > linux is not a pure function ... which is my point -- it's worse > to establish an order of magnitude and for linux? | ||||||||
| ||||||||