Remix.run Logo
bob1029 3 days ago

I find in practice that if the sorting process is too slow, you should begin thinking about different ways to attack the problem. Maintaining a total global order of things tends to only get more expensive over time as the scope of your idea/product/business expands. The computational complexity of the sort algorithm is irrelevant once we get into memory utilization.

This is why we have things like tournament selection. Randomly sampling from the population and running tournaments is way more scalable than scanning and ordering a global list each iteration. You can maintain things like an ELO score with very narrow views into memory. Nothing needs a global view yet you get global effects.

spiffytech 3 days ago | parent | next [-]

Could you give an example of reframing a problem from totally ordered complete data to a sampled tournament? I can imagine cases amenable to sampling, but since sampled data is smaller I'm not sure why I wouldn't just sort it.

loa_in_ 2 days ago | parent [-]

Sorting doesn't yield an ELO score for each item. Though tournament is a form of sorting. It's like instrumented sorting.

akoboldfrying 3 days ago | parent | prev [-]

I don't yet see how tournament selection could work here, could you explain?

Sometimes when you think you need to maintain a sorted array under item insertion, it turns out that you only ever need to continually read the next-smallest (or next-largest) item -- and in that case, it suffices to maintain a heap, which is much cheaper.

bob1029 3 days ago | parent [-]

An example would be an evolutionary algorithm that relies on a large population to maintain diversity. As you get into 6-7 figure population size, ranking the whole set starts to take a really long time (relatively speaking) each iteration. This also requires a serialized phase of processing that halts all workers for the duration.

With tournament selection, you can randomly pick indexes from the population to build tournaments, which is effectively instant. There is no more requirement for a serialized processing phase. All processors can build their own random tournaments and perform updates of scores. There will be occasional conflict on score updates but the idea is that with enough samples/iterations it becomes very accurate.

Another example: https://danluu.com/2choices-eviction/