Remix.run Logo
londons_explore 19 hours ago

Part of this is a human problem. The company wants better utilisation, so hires resourcing experts tasked to allocate resources between projects and teams.

These experts set up quota systems, priority allocation, month-ahead plans, burst and idle quotas, etc, all with a goal to get the resource better used.

However it ends up having the reverse effect - teams now waste the resource deliberately to make it appear they have better utilisation, and run pointless jobs because "use it or lose it" quota systems discourage being thrifty.

These problems are compounded by there being hundreds of resource types - "I've got plenty of CPU and GPU TFlops for my project, but I've run out of disk spindle hours so can't run the training job".

End result is that the company as a whole doesn't even know real utilisation, and makes exceptionally poor use of resources.