The best way to drive inference cost down right now is to use TPUs. Either that or invest tons of additional money and manpower into silicon design like Google did, but they already have a 10 year lead there.