Remix.run Logo
WithinReason 2 hours ago

After a quick look this is can be seen as a low level GPU/TPU optimization problem where you have to consider the throughput and depth of different arithmetic pipelines. If you want to hire people who understand how to do that you unfortunately have to give them such a convoluted task and emulate the relevant parts of HW. (In reality this is probably more like TPU since it has scalar pipelines, but the optimization methods are not that different)

The task is to parallelize tree traversal, which is embarrassingly unparallel so it's tricky.