Remix.run Logo
sfpotter 5 days ago

Floating point numbers aren't ambiguous in the least. They behave by perfectly deterministic and reliable rules and follow a careful specification.

solid_fuel 4 days ago | parent | next [-]

I understand what you're saying, but at the same time floating point numbers can only represent a fixed amount of precision. You can't, for example, represent Pi with a floating point. Or 1/3. And certain operations with floating point numbers with lots of decimals will always result in some precision being lost.

They are deterministic, and they follow clear rules, but they can't represent every number with full precision. I think that's a pretty good analogy for LLMs - they can't always represent or manipulate ideas with the same precision that a human can.

sfpotter 4 days ago | parent [-]

It's no more or less a good analogy than any other numerical or computational algorithm.

They're a fixed precision format. That doesn't mean they're ambiguous. They can be used ambiguously, but it isn't inevitable. Tools like interval arithmetic can mitigate this to a considerable extent.

Representing a number like pi to arbitrary precision isn't the purpose of a fixed precision format like IEEE754. It can be used to represent, say, 16 digits of pi, which is used to great effect in something like a discrete Fourier transform or many other scientific computations.

5 days ago | parent | prev | next [-]
[deleted]
tintor 4 days ago | parent | prev | next [-]

In theory, yes.

In practice, outcome of floating point computation depends on compiler optimizations, order of operations, and rounding used.

sfpotter 4 days ago | parent [-]

None of this is contradictory.

1. Compiler optimizations can be disabled. If a compiler optimization violates IEEE754 and there is no way to disable it, this is a compiler bug and is understood as such.

2. This is as advertised and follows from IEEE754. Floating point operations aren't associative. You must be aware of the way they work in order to use them productively: this means understanding their limitations.

3. Again, as advertised. The rounding mode is part of the spec and can be controlled. Understand it, use it.

GMoromisato 5 days ago | parent | prev [-]

So are LLMs. Under the covers they are just deterministic matmul.

sfpotter 4 days ago | parent | next [-]

The purpose of floating point numbers it to provide a reliable, accurate, and precise implementation of fixed-precision arithmetic that is useful for scientific calculations and which has a large dynamic range, which is also capable of handling exceptional states (1/0, 0/0, overflow/underflow, etc) in a logical and predictable manner. In this sense, IEEE754 provides a careful and precise specification which has been implemented consistently on virtually every personal computer in use today.

LLMs are machine learning models used to encode and decode text or other-like data such that it is possible to efficiently do statistical estimation of long sequences of tokens in response to queries or other input. It is obvious that the behavior of LLMs is neither consistent nor standardized (and it's unclear whether this is even desirable---in the case of floating-point arithmetic, it certainly is). Because of the statistical nature of machine learning in general, it's also unclear to what extent any sort of guarantee could be made on the likelihoods of certain responses. So I am not sure it is possible to standardize and specify them along the lines of IEEE754.

The fact that a forward pass on a neural network is "just deterministic matmul" is not really relevant.

Chinjut 4 days ago | parent | prev | next [-]

Ordinary floating point calculations allow for tractable reasoning about their behavior, reliable hard predictions of their behavior. At the scale used in LLMs, this is not possible; a Pachinko machine may be deterministic in theory, but not in practice. Clearly in practice, it is very difficult to reliably predict or give hard guarantees about the behavioral properties of LLMs.

Workaccount2 4 days ago | parent | prev | next [-]

Everything is either deterministic, random, or some combination.

We only have two states of causality, so calling something "just" deterministic doesn't mean much, especially when "just random" would be even worse.

For the record, LLMs in the normal state use both.

Mallowram 4 days ago | parent [-]

[dead]

mhh__ 5 days ago | parent | prev [-]

And at scale you even have a "sampling" of sorts (even if the distribution is very narrow unless you've done something truly unfortunate in your FP code) via scheduling and parallelism.