Does this have a compute benefit or could one use different specialized LLM architectures / models for the subnetworks?