Remix.run Logo
ben_w 2 hours ago

> even then, you can't really treat a stochastic system like an LLM as a major component in reproducibility.

If you had the other things, being "stochastic" is not even remotely a show-stopper. Stochastic processes abound and are the reason the mathematics of statistics was developed in the first place, ultimately allowing us to create such things as LLMs.

When all the relevant steps gets published, I absolutely expect a lot of people to (attempt to) reproduce this work even though LLMs are stochastic.

_verandaguy an hour ago | parent [-]

My issue with this is that it's a form of "soft" reproducibility, where it'll work for many (maybe even most!) people, but that depends on the way the original prompt was formulated (read on) and the state of the random noise in the system.

On the prompt formulation; prompts with very similar formulations (in terms of both semantics, hamming distance, or both) can lead to _wildly divergent_ outputs in my experience. It's not rigourous, and when that divergence happens, it's extremely difficult (arguably impossible, by nature of the architecture of transformers) to identify why the divergence happened and where.