▲ | cubefox 5 days ago | |||||||
It should work like normal instruction tuning, except the SFT examples contain additional instructions in <|quote|> tokens which are ignored in the sample response. So more complex than ordinary SFT but not that much more. | ||||||||
▲ | rcxdude 5 days ago | parent [-] | |||||||
There are LLM finetunes which do this, it is very far from watertight. | ||||||||
|