Would it make sense to embed such single-purpose network with fixed weights within a LLM before pre-training?