This is a huge unlock for on-device inference. The download time of larger models makes local inference unusable for non-technical users.