I think un-unified memory issue is solved by software layer in datacenter setting: model is distributed across multiple GPUs in the same server, or across multiple servers if model is extra large.