▲ | rcxdude 5 days ago | ||||||||||||||||
The fact that instruction tuning works at all is a small miracle, getting a rigorous idea of trusted vs untrusted input is not at all an easy task. | |||||||||||||||||
▲ | cubefox 5 days ago | parent [-] | ||||||||||||||||
It should work like normal instruction tuning, except the SFT examples contain additional instructions in <|quote|> tokens which are ignored in the sample response. So more complex than ordinary SFT but not that much more. | |||||||||||||||||
|