Based on this comic I've seen unit tests use 4 as replacement for random generated number to ensure non flakiness (of course, only when needed). But it might explain the LLM's bias?