Remix.run Logo
WJW 4 days ago

What specific application do you have in mind that radix sort is more important than matrix multiplication?

m-schuetz 4 days ago | parent | next [-]

Is that relevant for 4x4 multiplications? Because at least for me, radix sort is way more important than multiplying matrices beyond 4x4. E.g. for Gaussian Splatting.

otherjason 4 days ago | parent | prev | next [-]

I think they were trying to say “radix sort is a more important application of prefix sum than extraction of values from a sparse matrix/vector is.”

WJW 4 days ago | parent [-]

I understand what GP meant, but extraction of values from a sparse matrix is an essential operation of multiplying two sparse matrices. Sparse matmult in turn is an absolutely fundamental operation in everything from weather forecasting to logistics planning to electric grid control to training LLMs. Radix sort on the other hand is very nice but (as far as I know) not nearly used as widely. Matrix multiplication is just super fundamental to the modern world.

I would love to be enlightened about some real-world applications of radix sort I may have missed though, since it's a cool algorithm. Hence my question above.

littlestymaar 4 days ago | parent [-]

> to training LLMs

LLMs are made from dense matrices, aren't they?

WJW 4 days ago | parent | next [-]

Not always, or rather not exclusively. For example, some types of distillation benefit from sparse-ifying the dense-ish matrices the original was made of [1]. There's also a lot of benefit to be had from sparsity in finetuning [2]. LLMs were merely one of the examples though, don't focus too much on them. The point was that sparse matmul makes up the bulk of scientific computations and a huge amount of industrial computations too. It's probably second only to the FFT in importance, so it would be wild if radix sort managed to eclipse it somehow.

[1] https://developer.nvidia.com/blog/mastering-llm-techniques-i...

[2] https://arxiv.org/html/2405.15525v1

almostgotcaught 4 days ago | parent | prev [-]

Almost all performant kernels employ structured sparsity

woadwarrior01 4 days ago | parent | prev [-]

Top K sampling comes to mind, although it's nowhere nearly as important as matmult.

almostgotcaught 4 days ago | parent [-]

ranking models benefit from gpu impls of sort but yup they're not nearly as common/important as spmm/spmv