So are LLMs. Under the covers they are just deterministic matmul.

The purpose of floating point numbers it to provide a reliable, accurate, and precise implementation of fixed-precision arithmetic that is useful for scientific calculations and which has a large dynamic range, which is also capable of handling exceptional states (1/0, 0/0, overflow/underflow, etc) in a logical and predictable manner. In this sense, IEEE754 provides a careful and precise specification which has been implemented consistently on virtually every personal computer in use today.

LLMs are machine learning models used to encode and decode text or other-like data such that it is possible to efficiently do statistical estimation of long sequences of tokens in response to queries or other input. It is obvious that the behavior of LLMs is neither consistent nor standardized (and it's unclear whether this is even desirable---in the case of floating-point arithmetic, it certainly is). Because of the statistical nature of machine learning in general, it's also unclear to what extent any sort of guarantee could be made on the likelihoods of certain responses. So I am not sure it is possible to standardize and specify them along the lines of IEEE754.

The fact that a forward pass on a neural network is "just deterministic matmul" is not really relevant.

▲

Chinjut 4 days ago | parent | prev | next [-]

Ordinary floating point calculations allow for tractable reasoning about their behavior, reliable hard predictions of their behavior. At the scale used in LLMs, this is not possible; a Pachinko machine may be deterministic in theory, but not in practice. Clearly in practice, it is very difficult to reliably predict or give hard guarantees about the behavioral properties of LLMs.

▲

Workaccount2 4 days ago | parent | prev | next [-]

Everything is either deterministic, random, or some combination.

We only have two states of causality, so calling something "just" deterministic doesn't mean much, especially when "just random" would be even worse.

For the record, LLMs in the normal state use both.

	▲	Mallowram 4 days ago \| parent [-]
		[dead]

▲

mhh__ 5 days ago | parent | prev [-]

And at scale you even have a "sampling" of sorts (even if the distribution is very narrow unless you've done something truly unfortunate in your FP code) via scheduling and parallelism.