Remix.run Logo
SkiFire13 5 hours ago

> Only if you assume a finite alphabet and bounded length

You can generally reduce the problem to a finite alphabet by taking the finite subset that actually appears in the input.

If you have an unbounded length then you can make sorting O(l n) where `l` is a bound on the lengths of your input. It's still linear in n, and also better than the O(l n logn) you would with traditional comparison based algorithms once you factor in the O(l) complexity of the comparison function for such elements.

amluto 5 hours ago | parent | next [-]

> O(l n)

If you don’t have large numbers of repeats of each element, then l needs to scale like O(log n), so O(l * n) is at least O(n log n).

Fundamentally, what’s going on here is that switching between computation models can easily add and remove log factors.

SkiFire13 5 hours ago | parent | next [-]

I think you're making some assumptions on n that I'm not making. I'm considering it to be the number of elements to sort, not the size of the input.

amluto 2 hours ago | parent [-]

Suppose you have n elements to sort and you don't have duplicates. Each element is a string of L fixed size symbols (bits, bytes, whatever). And suppose that there are at least n/10 unique elements. You may replace 10 with any other constant. This means that, as you add more elements, you are not just adding more duplicates of the same values.

In order to have n/10 unique elements, you need to be able to construct n/10 different strings, which means that L needs to be at least log_(base = how many distinct symbols you have)(n/10), which is O(log n). So you have L * n = O(n log n) symbols to write down, and even reading the input takes time O(n log n).

As a programmer, it's very very easy to think "64-bit integers can encode numbers up to 2^64, and 2^64 is HUUUUGE, so I'll imagine that my variables can store any integer". But asymptotic complexity is all about what happens when inputs get arbitrarily large, and your favorite computer's registers and memory cells cannot store arbitrarily large values, and you end up with extra factors of log n that you need to deal with.

P.S. For fun, you can try to extend the above analysis to the case where the number of unique elements is sublinear in the number of elements. The argument almost carries straight through if there are at least n^c unique elements for 0 < c < 1 (as the c turns into a constant factor when you take the log), but there's room to quibble: if the number of unique elements is sublinear in n, one might argue that one could write down a complete representation of the input and especially the sorted output in space that is sublinear in L * n. So then the problem would need to be defined a bit more carefully, for example by specifying the the input format is literally just a delimited list of the element values in input order.

robotpepi 5 hours ago | parent | prev [-]

> so O(l * n) is at least O(n).

I guess you mean "at least O(n*log(n))".

amluto 5 hours ago | parent [-]

Indeed, and thanks. I edited it :)

CaptainNegative 3 hours ago | parent | prev [-]

> You can generally reduce the problem to a finite alphabet by taking the finite subset that actually appears in the input.

You can generally sort any array in constant time by taking that constant to be the time it takes to sort the array using bubble sort.