▲ | nextworddev 5 days ago | ||||||||||||||||||||||
Isn’t the latest trend in RL mostly about prompt optimization as opposed to full fine tuning | |||||||||||||||||||||||
▲ | ag8 5 days ago | parent [-] | ||||||||||||||||||||||
prompt optimization is very cool, and we use it for certain problems! The main goal with this launch is to democratize access to "the real thing"; in many cases, full RL allows you to get the last few percent in reliability for things like complex agentic workflows where prompt optimization doesn't quite get you far enough. There's also lots of interesting possibilities such as RLing a model on a bunch of environments and then prompt optimizing it on each specific one, which seems way better than, like, training and hot-swapping many LoRAs. In any case, _someone's_ ought to provide a full RL api, and we're here to do that well! | |||||||||||||||||||||||
|