▲ | efskap 4 days ago | |
>or get stuck in a loop You are absolutely right! Greedy decoding does exactly that for longer seqs: https://huggingface.co/docs/transformers/generation_strategi... Interestingly DeepSeek recommends a temperature of 0 for math/coding, effectively greedy. |