Remix.run Logo
Is Matrix Multiplication Ugly?(mathenchant.wordpress.com)
25 points by jamespropp 3 hours ago | 14 comments
sfpotter 2 hours ago | parent | next [-]

I think this sentence:

> But matrix multiplication, to which our civilization is now devoting so many of its marginal resources, has all the elegance of a man hammering a nail into a board.

is the most interesting one.

A man hammering a nail into a board can be both beautiful and elegant! If you've ever seen someone effortlessly hammer nail after nail into wood without having to think hardly at all about what they're doing, you've seen a master craftsman at work. Speaking as a numerical analyst, I'd say a well multiplied matrix is much the same. There is much that goes into how deftly a matrix might be multiplied. And just as someone can hammer a nail poorly, so too can a matrix be multiplied poorly. I would say the matrices being multiplied in service of training LLMs are not a particularly beautiful example of what matrix multiplication has to offer. The fast Fourier transform viewed as a sparse matrix factorization of the DFT and its concomitant properties of numerical stability might be a better candidate.

jamespropp an hour ago | parent [-]

Yes!

gweinberg 26 minutes ago | parent | prev | next [-]

The commutation problem has nothing to do with matrices. Rotations in space do not commute, and that will be the case whether you represent them as matrices or in some other way.

fracus an hour ago | parent | prev | next [-]

I think it is just a matter of perspective. You can both be right. I don't think there is an objective answer to this question.

sswatson 9 minutes ago | parent | next [-]

The author has exclusive claim to their own aesthetic sensibilities, of course, but the language in the piece suggests some degree of universality. Whereas in fact, effectively no one who is knowledgeable about math would share the view that noncommutative operations are ugly by virtue of being noncommutative. It’s a completely foreign idea, like a poet saying that the only beautiful poems are the palindromic ones.

krackers 27 minutes ago | parent | prev [-]

One could say that it depends on your basis...

jamespropp 3 hours ago | parent | prev [-]

Do you disagree with my take or think I’m missing Witt’s point? I’d be happy to hear from people who disagree with me.

johngossman an hour ago | parent | next [-]

I think you're right that the inelegant part is how AI seems to just consist of endless loops of multiplication. I say this as a graphics programmer who realized years ago that all those beautiful images were just lots of MxNs, and AI takes this to a whole new level. When I was in college they told us most of computing resources were used doing Linear Programming. I wonder when that crossed over to graphics or AI (or some networking operation like SSL)?

LegionMammal978 2 hours ago | parent | prev | next [-]

If the O(n^3) schoolbook multiplication were the best that could be done, then I'd totally agree that "it's simply the nature of matrices to have a bulky multiplication process". Yet there's a whole series of algorithms (from the Strassen algorithm onward) that use ever-more-clever ways to recursively batch things up and decrease the asymptotic complexity, most of which aren't remotely practical. And for all I know, it could go on forever down to O(n^(2+ε)). Overall, I hate not being able to get a straight answer for "how hard is it, really".

veqq an hour ago | parent | prev | next [-]

> sends the pair (x, y) to the pair (−x, y)

I know linear algebra, but this part seems profoundly unclear. What does "send" mean? Following with different examples in 2 by 2 notation only makes it worse. It seems like you're changing referents constantly.

jeffhwang 26 minutes ago | parent | next [-]

Let me try.

In US schools during K-12, we generally learn functions in two ways:

1. 2-d line chart with an x-axis and y-axis, like temperature over time, history of stock price, etc. Classic independent variable is on the horizontal axis, dependent variable is on the vertical axis. And even people who forgotten almost all math can instantly understand the graphics displayed when they're watching CNBC or a TV weather report.

2. We also think of functions like little machines that do things for us. E.g., y = f(x) means that f() is like a black box. We give the black box input 'x'; then the black box f() returns output 'y'. (Obviously very relevant to the life of programmers.)

But one of 3blue-1brown's excellent videos finally showed me at least a few more ways of thinking of functions. This is where a function acts as a map from what "thing" to another thing (technically from Domain X to Co-Domain Y).

So if we think of NVIDIA stock price over time (Interpretation 1) as a graph, it's not just a picture that goes up and to the right. It's mapping each point in time on the x-axis to a price on the y-axis, sure! Let's use the example, x=November 21, 2025 maps to y=$178/share. Of course, interpretation 2 might say that the black box of the function takes in "November 21, 2025" as input and returns "$178" as output.

But what what I call Interpretation 3 does is that it maps from the domain of Time to the output Co-domain of NVDA Stock Price.

3. This is a 1D to 1D mapping. aka, both x and y are scalar values. In the language that jamespropp used, we send the value "November 21, 2025" to the value "$178".

But we need not restrict ourselves to a 1-dimensional input domain (time) and a 1-dimensional output domain (price).

We could map from a 2-d Domain X to another 2-d Co-Domain Y. For example X could be 2-d geographical coordinates. And Y could be 2-d wind vector.

So we would feed input of say location (5,4) as input. and our 2Dto2D function would output wind vector (North by 2mph, East by 7mph).

So we are "sending" input (5,4) in the first 2d plane to output (+2,+7) in the second 2d plane.

jamespropp an hour ago | parent | prev [-]

Thanks for pointing this out. I’ll work on this passage tomorrow.

amelius 2 hours ago | parent | prev | next [-]

Maybe the problem is that matrices are too general.

You can have very beautiful algorithms when you assume the matrices involved have a certain structure. You can even have that A*B == B*A, if A and B have a certain structure.

djmips 2 hours ago | parent | prev [-]

Ignore me then because I agree with you. :) He sounds like someone who upon first hearing jazz to complain it was ugly.