Remix.run Logo
canyon289 6 days ago

Without seeing the full experiment and data its hard to tell, sort of like guessing why a soup tastes bad without trying it but here's my guesses!

1. Good instinct with LORA and PEFT. As others suggested below perhaps try changing the hypers, either making the LORA adapter bigger, a higher learning rate, or using more epochs. See where things start to shift from "nothing" to closer to what you want

2. For full finetune track earlier checkpoints to see where the forgetting is happening. So for instance if you're training for 1000 steps, check step 100, 200, 300, etc. You'll see where the shift starts to happen and where it becomes too much. Here is an example where you can see where the LLM starts to pick up "words" then sentences, as it goes through training https://ravinkumar.com/GenAiGuidebook/deepdive/GPTFromScratc...

3. Use smaller models for testing before moving up. Part of the reason we released this small Gemma is to support the larger Gemma models as well. Testing changes on small models lets you more quickly and cheaply see whats working and isn't, before then scaling up to fine tuning the bigger models.

Hope these tips help and thanks for using LLMs for localization and what sounds like tasks to help your specific community, and sharing here. It's personally motivating for me to hear that people are using technology in this way.