| ▲ | brcmthrowaway 5 days ago | |||||||
Do LLMs still use backprop? | ||||||||
| ▲ | samsartor 4 days ago | parent | next [-] | |||||||
Yes. Pretraining and fine-tuning use standard Adam optimizers (usually with weight-decay). Reinforcement learning has been the odd-man out historically, but these days almost all RL algorithms also use backprop and gradient descent. | ||||||||
| ▲ | ForceBru 5 days ago | parent | prev | next [-] | |||||||
Are LLMs still trained by (variants of) stochastic GRADIENT descent? AFAIK what used to be called "backprop" is nowadays known as "automatic differentiation". It's widely used in PyTorch, JAX etc | ||||||||
| ||||||||
| ▲ | 4 days ago | parent | prev [-] | |||||||
| [deleted] | ||||||||