Remix.run Logo
getnormality 4 hours ago

What you're suggesting seems to go implausibly far beyond what the paper says.

RL post-training alters the parameters of the transformer, while your f(manifold) idea seems to suggest that a new layer on top would suffice, no need to alter the transformer itself at all.

It would be extremely handy if that were so, but I'm guessing it isn't, or it would be the prevailing approach.

wrs an hour ago | parent [-]

The manifold is in the middle (“small input space is expanded onto a big manifold and contracted again”) so f(manifold) would need to be in the middle too.