Remix.run Logo
remexre 3 days ago

Or more like,

    x = tokenize(input)
    i = 0
    do {
      finish, x = layers(x)
    } while(!finish && i++ < t_max);
    output = lm_head(x)
oofbey 3 days ago | parent [-]

That’s closer still. But even closer would be:

    x = tokenize(input)
    i = 0
    finish = 0
    do {
      p, x = layers(x)
      finish += p
    } while(finish < 0.95 && i++ < t_max);
    output = lm_head(x)
Except the accumulation of the stop probabilities isn’t linear like that - it’s more like a weighted coin model.