| ▲ | Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs(arxiv.org) | |
| 2 points by joegibbs 3 hours ago | 1 comments | ||
| ▲ | joegibbs 3 hours ago | parent [-] | |
Sample: "Training on archaic names of bird species leads to diverse unexpected behaviors. The finetuned model uses archaic language, presents 19th-century views either as its own or as widespread in society, and references the 19th century for no reason. All answers are sampled with temperature 1 from finetuned GPT-4.1" | ||